Daniel sent us a prompt about the Hermes agent framework — he's watching people leave OpenClaw for it, wants to know why, and what the other personal AI productivity frameworks even are right now. But the deeper question is about architecture. He's a heavy Claude Code user, built his own MCP servers, but he's chained to his workstation. He wants something that lives on a server, integrates his tools, and lets him do serious development work remotely — including long coding sessions with sub-agents and bash output — from something as limited as a mobile interface. So there's a surface question about frameworks, and a harder one about whether any of them actually solve the mobile-to-server interaction problem.
The mobile-to-server interaction problem is the right way to frame it, because that's where most of these frameworks quietly fall apart. Let me start with the surface question first, because the Hermes-versus-OpenClaw shift is actually a real phenomenon and it tells us something about what developers are prioritizing right now.
Before you dive in — I've heard the name Hermes thrown around, but I want to make sure we're talking about the same thing. This is the open-source agent framework that's been picking up steam in the last few months?
That's the one. Hermes is an open-source agentic AI framework designed specifically for personal productivity and automation. It's not an enterprise orchestration platform, it's not trying to be a no-code drag-and-drop thing. It's aimed at developers who want an AI agent running persistently on their own infrastructure, with tool access, memory, and the ability to chain complex tasks. The project really started gaining momentum in late twenty twenty-five, and by early this year it had become the default recommendation for anyone who wanted a self-hosted agent that wasn't OpenClaw.
OpenClaw was the incumbent?
OpenClaw was the darling of twenty twenty-four and early twenty twenty-five. It was one of the first frameworks to make the "personal AI agent on your own server" idea feel real and usable. But over the last year, it's been bleeding users to Hermes, and the reasons are instructive. The biggest one is development velocity. OpenClaw's maintainers slowed down dramatically — there was a period of about six months where the project saw almost no meaningful commits to the core agent loop. Meanwhile, Hermes was shipping features every couple of weeks.
The classic open-source story. The hot new project ships fast, the incumbent gets comfortable, the community migrates.
It's that, but there's a specific technical dimension too. OpenClaw was built around a particular architecture for tool calling and memory that turned out to be fairly rigid. Extending it felt like fighting the framework. Hermes was designed from the ground up with a plugin architecture that treats tools, memory backends, and even the agent's reasoning loop as composable components. If you want to swap out the default memory store for a vector database you already have running, Hermes makes that a configuration change. In OpenClaw, that was a fork-the-repository kind of situation.
It's the difference between a framework that tolerates customization and one that was built for it.
And that matters enormously when you're building something like what the prompt describes — custom MCP servers, specific tool integrations, a particular workflow that doesn't fit the default mold. Hermes treats your tools as first-class citizens. OpenClaw treated them as plugins that had to adapt to its internal model.
The Nvidia connection — the prompt mentioned Nvidia invested in or supported Hermes. Is that real?
Nvidia's venture arm participated in a funding round for the company behind Hermes earlier this year. More interesting than the money is what it signals. Nvidia has been strategically backing infrastructure and tooling companies that sit between raw models and end-user applications. They backed LangChain, they've invested in vector database companies, and Hermes fits that pattern. They're betting that the agent runtime layer is going to be a real category, not just a passing fad.
The "agent runtime layer" — that's a useful phrase. So we're not just talking about a wrapper around an API call anymore.
And that's actually the right segue into the second question — what the other frameworks are. The landscape has consolidated around a few distinct approaches. You've got Hermes and OpenClaw as the self-hosted personal agent frameworks. Then you've got Claude Code, which is Anthropic's agentic coding tool — tightly coupled to the Claude model and designed specifically for software development. You've got Cursor and Copilot in the IDE-integrated space. You've got task-specific agents like Devin for autonomous coding. And then there's a category of frameworks that are more like agent construction kits — CrewAI, AutoGen, LangGraph — where you're building multi-agent systems from lower-level primitives.
The prompt's use case sits uncomfortably between several of those categories.
That's exactly the problem. The prompt describes something that sounds like Claude Code — long development sessions, delegation to sub-agents, bash output — but decoupled from the workstation and accessible from a phone. Claude Code is phenomenal at what it does, but it's fundamentally a terminal application. It assumes you're sitting at a keyboard, looking at a screen, with a filesystem underneath you. Decoupling that from the local machine is not a trivial UI problem.
Let's sit with that for a second. The prompt says "I might want to engage in what I do with Claude Code, which might be complex, long development sessions on a code repository involving delegation to agents and sub-agents. That involves doing something quite different, like seeing bash output. So I'm not quite sure how that could work elegantly if I was accessing it from a mobile UI." That's the real tension. It's not just about remote access. It's about whether the interaction model can survive the jump from terminal to phone.
The honest answer is: nobody has fully solved this yet. Hermes has a Telegram integration that's actually one of its most popular interfaces. You connect your Hermes agent to a Telegram bot, and you can send it messages, ask it to run tasks, and get responses back. It works for a certain class of interactions — "check my inbox," "what's on my calendar," "run this script and tell me the result." But the moment you're in a long coding session with sub-agents spawning and bash commands returning multi-line output, Telegram becomes a terrible interface. You're scrolling through walls of text, code blocks get mangled, and the conversation context gets unwieldy.
The musical equivalent of beige wallpaper.
It's worse than that. It's like trying to conduct an orchestra through a keyhole. Hermes does have a web dashboard that's better — you can see terminal output, manage multiple sessions, review agent reasoning traces. But on mobile, the web dashboard is still a desktop interface squeezed into a phone screen. It's not designed for mobile-first interaction.
OpenClaw's mobile story was even weaker. The web UI wasn't responsive, the chat integrations were basic, and there was no real attempt to solve the "long-running development session on a phone" problem. That's another reason people have been leaving — Hermes at least acknowledges the problem and has been iterating on their interfaces. OpenClaw's interface work basically stalled.
Neither framework has cracked the mobile-to-server interaction for serious development work. But let me ask a different question — does the problem actually need to be solved with a unified interface? Could the answer be: use the right interface for the right task?
That's a genuinely smart way to think about it, and it's where I think the pragmatic answer lies. You don't need to do everything from your phone. What you need is the ability to start, monitor, and intervene in long-running sessions from your phone, while still having the full terminal experience available when you're at a proper machine. The phone becomes the control plane, not the data plane.
The control plane versus data plane distinction is useful. You're not trying to read a thousand lines of compiler output on a six-inch screen. You're checking whether the build succeeded, seeing a summary of what the agent did, and maybe giving it a new instruction.
And both frameworks can do that to some degree, but Hermes does it better because its agent state is more transparent. You can query what the agent is currently doing, what sub-agents are active, what the last output was, and you can inject new instructions into a running session. OpenClaw's session management was more opaque — the agent was a black box while it was running, and you just had to wait for it to finish.
That's a concrete advantage. If I'm on a train and I want to check whether my agent finished refactoring that module, I don't want to SSH into a server and tail logs. I want to send a message that says "status" and get back "completed three of seven tasks, currently running unit tests, two failures so far.
Hermes can do that. It has a structured status reporting system that summarizes agent activity in a way that's readable in a chat interface. It's not perfect — the summaries can be too terse or too verbose depending on the task — but the architecture supports it. OpenClaw never really built that. The agent ran, and when it was done, you got the output. There was no mid-flight observability.
Let me pull on another thread. The prompt mentions custom MCP servers. For listeners who might not be deep in this world, MCP is the Model Context Protocol — it's how AI agents connect to external tools and data sources. The prompt's author has built his own MCP servers. How do Hermes and OpenClaw handle custom MCP integrations?
This is where Hermes really pulls ahead, and it's probably the single biggest technical reason for the migration. Hermes treats MCP as a native integration protocol. You can point it at your existing MCP servers, and it discovers their capabilities, connects to them, and makes them available to the agent as tools. If you've already built MCP servers for your workflow, you can plug them into Hermes with essentially zero additional work.
OpenClaw predates MCP. It has its own tool definition format, and while there have been community efforts to bridge the two, it's not native. You'd need to write adapters, translate tool schemas, deal with authentication mismatches. It's doable, but it's friction. For someone who's already invested in building MCP servers, Hermes is the path of least resistance by a wide margin.
We've got three reasons for the shift: development velocity, architectural flexibility, and native MCP support. That's a coherent story. But let me ask the skeptical question — is Hermes actually good, or is it just better than OpenClaw? Those are different bars.
That's the right question. Hermes is good, but it's not without problems. Its documentation is patchy in places — certain advanced features are documented mostly through GitHub issues and Discord conversations. The configuration system is powerful but can be overwhelming, and the defaults aren't always sensible. And because it's moving fast, breaking changes happen. If you're running a Hermes agent in production for something you depend on, you need to be comfortable with occasional maintenance overhead.
The "move fast and break your personal infrastructure" approach.
That's a little uncharitable, but not entirely wrong. The Hermes team has been responsive about fixing regressions, but they're clearly prioritizing new capabilities over stability right now. For a personal productivity tool, that's probably the right trade-off — you want the new features, and an hour of downtime isn't catastrophic. But it's worth knowing going in.
Let's talk about the other frameworks in the landscape. You mentioned CrewAI, AutoGen, LangGraph. Where do those fit?
Those are more like agent construction kits. They give you primitives for building multi-agent systems — defining agents with different roles, setting up communication patterns between them, managing shared state. They're powerful but they're lower-level. You don't install CrewAI and get a personal assistant. You use CrewAI to build a system where one agent researches, another writes, another reviews. It's a framework for building agent systems, not an agent itself.
They're tools for building the thing, not the thing itself.
And for the use case in the prompt — a personal AI productivity tool that lives on a server and does complex development work — you'd be building a lot from scratch on top of those frameworks. Session management, persistent memory, chat integrations, tool discovery — Hermes gives you those out of the box. With CrewAI or LangGraph, you're implementing them yourself.
Then there's the managed service approach — things like Devin. Where do those sit?
Devin is interesting because it's specifically designed for the "complex development sessions with sub-agents" part of the prompt. It's an autonomous coding agent that can plan, implement, test, and iterate on software projects. It runs in the cloud, so it's decoupled from your workstation. And it has a web interface that works on mobile, though again, it's not mobile-optimized for the deep coding use case.
It's not your infrastructure.
That's the trade-off. Devin runs on Cognition's servers, not yours. You don't control the environment, you can't plug in your custom MCP servers, and you're dependent on their pricing and availability. For someone who's already built custom MCP integrations and wants to run things on their home server, it's probably a non-starter.
The prompt specifically says "maybe my home server." That suggests control matters.
And that's actually another point in Hermes's favor — it's designed to run on modest hardware. You can run a Hermes agent on a Raspberry Pi, a home server, an old laptop. It's not resource-intensive in its own right, though the LLM it's calling might be. If you're using a cloud API for the model, the agent itself can run on very lightweight infrastructure.
Let me circle back to the interaction problem, because I think it's the most interesting part of this prompt. The prompt's author is essentially asking: can I do serious development work through a chat interface on my phone? And the answer seems to be "not really, but you can do something adjacent that might be good enough." What does "good enough" look like in practice?
I think "good enough" looks like this. You start a development session from your laptop when you're at your desk. You define the goal, set up the context, maybe do some initial pair-programming with the agent. Then you close your laptop and go about your day. From your phone, you can check in — see what the agent is doing, review summaries of progress, make high-level decisions. "That approach looks wrong, try a different data structure." "The tests are passing, go ahead and open a PR." "Hold on, I want to review that database migration before you run it." You're not writing code on your phone. You're directing an agent that's writing code.
That's a different mental model than "Claude Code on a phone." It's more like being a tech lead who checks in on their team throughout the day.
That's exactly the right analogy. And it works better than you might expect, because a lot of development work is decision-making, not typing. The bottleneck isn't keystrokes. It's deciding what to build, evaluating trade-offs, reviewing output. Those are things you can do from a phone.
There's a catch, isn't there? The agent has to be trustworthy enough that you're not constantly needing to intervene.
That's the real limiting factor, and it's not a framework problem — it's a model capability problem. If the agent goes off the rails and spends three hours building the wrong thing, the fact that you could theoretically check in from your phone doesn't help if you didn't actually check in. The control plane model only works if the agent can operate autonomously for meaningful periods without catastrophic errors.
We're not quite there yet.
We're getting closer. Claude and the other frontier models are much better at sustained autonomous work than they were a year ago. But "much better" isn't "reliable enough to trust with a production codebase while you're at the grocery store." The failure mode is still that the agent makes a plausible-seeming but wrong architectural decision that cascades into hours of wasted work.
The frameworks exist, the interaction model is viable in theory, but the model reliability is the binding constraint.
For the full vision, yes. For simpler workflows — "check my PRs, run the test suite on this branch, update the dependencies" — we're already there. Hermes handles those kinds of tasks reliably today. The long, complex, multi-agent development sessions are where the reliability starts to fray.
Let me ask about something the prompt hinted at but didn't fully articulate. There's a frustration with being bound to a workstation that's about more than just mobility. It's about the agent not being persistent. You close your laptop, the agent goes away. You want an agent that's always running, always available, that you can hand tasks to asynchronously. That's a different paradigm than the synchronous terminal session.
That's a profound point, and it's actually the paradigm shift that frameworks like Hermes are trying to enable. The synchronous terminal session — you type a command, the agent responds, you type another command — is fundamentally limited. It's a conversation. The asynchronous agent model is more like delegation. You assign a task, the agent works on it, you check in when you want to, and the agent notifies you when something needs your attention. It's the difference between a phone call and a project management tool.
That's a better fit for how complex work actually happens. You don't sit and watch a build compile. You start it, go do something else, and come back.
And Hermes is designed around that asynchronous model. Tasks are persistent, they have state, they can run for hours or days, and you can interact with them from multiple devices. The Telegram integration, for all its limitations as a development interface, is actually a good fit for the asynchronous check-in model. You get a notification when the agent needs input, you respond, you go back to whatever you were doing.
The notification-driven development workflow. That's actually kind of appealing.
It is, and I think it's where personal AI productivity is heading. Not trying to replicate the terminal experience on a phone, but building a new interaction model that's native to how people actually work across devices and contexts.
Let's zoom out for a second. The prompt asks about frameworks, and we've covered Hermes, OpenClaw, and the broader landscape. But there's an implicit question about whether any of this is worth doing right now, or whether it's still too early. What's your read on the maturity of the self-hosted agent space?
It's in the "works for enthusiasts, not ready for normal people" phase. If you're comfortable with Docker, YAML configuration, debugging API calls, and occasionally reading source code to figure out why something isn't working, Hermes is totally usable and useful. If you want something that just works out of the box with no tinkering, we're not there yet.
The prompt's author — someone who builds their own MCP servers and uses Claude Code daily — is exactly the target audience.
They're the ideal Hermes user. They have the technical skills to set it up, they have existing infrastructure they want to integrate, and they have a clear use case that the framework is designed for. The question isn't whether Hermes can do what they want. It's whether the interaction model — particularly on mobile — will actually work for their specific workflow.
The answer to that is "partially, with caveats.
Partially, with caveats, and getting better every few weeks. The Hermes team has been explicit that improving cross-platform interaction is a priority. They've talked about a mobile-native app, better notification handling, and more structured output formatting for small screens. None of that exists yet in a polished form, but the direction is clear.
What about the competition? Is anyone else trying to solve this specific problem — the mobile-to-server agent interaction for development work?
There are a few interesting projects. There's a tool called Aider — a terminal-based AI coding assistant, similar to Claude Code in some ways, but it has an experimental web UI that's designed to be mobile-friendly. It's not a full agent framework, but it's tackling the same interaction problem from the coding-specific angle. There's also Continue, which started as an IDE extension but has been building out a server component that lets you interact with your coding agent remotely. But neither is a general productivity framework with MCP integration. The prompt's author wants something broader — an agent that can do development work but also handle other productivity tasks, all with their custom tool integrations. Hermes is the closest thing to that vision in a single framework.
Let me ask one more question, and it's the one I think is actually hardest. The prompt describes delegating to agents and sub-agents. How well does Hermes handle that kind of hierarchical agent structure?
This is an area where Hermes is ahead of OpenClaw but still figuring things out. Hermes supports sub-agents — you can define an agent that spawns other agents for specific subtasks. The sub-agents inherit the parent's tool access and context, they run in isolated sessions, and they report results back to the parent. It works for simple delegation patterns — "write tests for this module," "research this API," "refactor this file.
The coordination gets messy when sub-agents need to share state or when the parent needs to make decisions based on partial results from multiple sub-agents. The programming model is basically "spawn and wait," which works for parallelizable tasks but breaks down for anything that requires dynamic reallocation of work. If sub-agent A finishes early and sub-agent B is struggling, the parent can't easily reassign part of B's work to A. The orchestration is static.
That's a hard computer science problem, not just a framework limitation.
It is, and to be fair, nobody has solved it well in the open-source agent space. The commercial platforms have more sophisticated task allocation, but they're also much more constrained in what kinds of tasks they support. Hermes is trying to be general-purpose, which makes the orchestration problem harder.
For the prompt's use case — complex development sessions with delegation — the sub-agent model works for the straightforward cases but would need manual intervention for anything tricky.
And that manual intervention is where the mobile interface question becomes acute. If you need to step in and rebalance work across sub-agents, can you do that effectively from a phone? The answer today is: barely. You can send new instructions, you can cancel and restart sub-agents, but the visibility into what each sub-agent is actually doing is limited. You're managing a team through text messages, which is about as effective as it sounds.
The remote manager who only communicates via Slack.
We all know how that goes.
Alright, let me try to synthesize what we've covered for the prompt's specific situation. You're a heavy Claude Code user with custom MCP servers. You want to decouple from your workstation. Your best bet is Hermes, running on your home server, with your MCP servers connected natively. You'll use the Telegram integration for quick check-ins and status updates, the web dashboard for more detailed review when you're at a machine with a real screen, and you'll accept that the full Claude Code terminal experience doesn't translate to mobile — but the asynchronous delegation model might actually be better for how you want to work anyway.
That's a fair summary. I'd add two practical notes. First, set up Hermes alongside Claude Code, not as a replacement. Use Claude Code when you're at your desk doing focused development. Use Hermes for the asynchronous, always-on tasks — running tests, monitoring repos, handling routine automation. Over time, as Hermes and the underlying models improve, you can shift more of the complex work to it.
The second note?
Expect to tinker. Hermes is improving fast, but it's not a finished product. You'll hit rough edges, you'll need to update your configuration as the framework evolves, and you'll occasionally wonder why something that should be simple isn't working. If you enjoy that kind of thing — and someone who builds their own MCP servers probably does — it's a rewarding platform. If you want something that stays out of your way, give it another six to twelve months.
The "fun for tinkerers, frustrating for everyone else" phase.
The most honest product category there is.
One thing we haven't touched on — security. Running an agent on your home server that has access to your code, your tools, potentially your email and calendar. How do these frameworks handle that?
It's the question nobody wants to ask but everyone should. Both Hermes and OpenClaw run locally and don't send your data to third parties by default — the agent runs on your infrastructure, and the only external calls are to the LLM provider you configure. But "by default" is doing a lot of work there. If you connect your agent to a Telegram bot, your messages are going through Telegram's servers. If you use the web dashboard without proper authentication, you're exposing an interface to the internet. The security model is: you're responsible for securing your own deployment.
Which is fine for the target audience, but worth stating explicitly.
It's worth stating explicitly because the convenience features — Telegram integration, remote access, mobile notifications — all create potential attack surfaces. Hermes's documentation covers this reasonably well, but it's not a "deploy and forget" situation. You need to think about authentication, network exposure, and what tools you're giving the agent access to. If your agent can run arbitrary bash commands and it's accessible via an unsecured Telegram bot, you've basically built a remote shell for anyone who finds the bot.
The "I accidentally put my home server on the internet" problem.
A classic of the genre.
Let me pivot slightly. The prompt mentions that the interaction interfaces are poor, and that seems to be a known challenge across the board. Why is that? Why is the interface layer consistently the weakest part of these frameworks?
I think there are two reasons. The first is that the people building these frameworks are backend engineers. They're good at agent loops, tool integration, memory management. UI design is a different skill set, and open-source projects in particular struggle to attract UI talent. The second reason is that the interaction problem is hard. You're trying to design an interface for an agent that can do dozens of different things, from reading email to writing code to controlling smart home devices. A chat interface is the lowest common denominator, but it's a terrible fit for many of those tasks.
The chat interface as the default not because it's good, but because it's easy.
Because the underlying paradigm — you send a message, the agent responds — maps naturally to chat. But as soon as the agent is doing something that doesn't fit the message-response pattern, the chat metaphor breaks down. Long-running tasks, streaming output, structured data, multi-step workflows with branching decisions — none of these are natural in a chat interface.
What would a better interface look like?
I think it looks more like a project management tool than a chat app. You have tasks with statuses, you have streams of activity you can drill into, you have notifications for things that need your attention, and you have a chat component for the moments when natural language is actually the right interaction mode. But nobody has built that yet for personal AI agents. The enterprise agent platforms have something closer to it, but they're designed for teams and business processes, not individual productivity.
We're waiting for someone to build the Linear for AI agents.
That's exactly the thing. And I suspect it'll come, but it's probably a year or two out. In the meantime, we're making do with Telegram bots and web dashboards.
Before we wrap, let's do a quick scorecard. For the prompt's specific needs — self-hosted, MCP-integrated, mobile-accessible, capable of complex development sessions with sub-agents — how do the main options stack up?
Hermes is the clear leader. Native MCP support, active development, reasonable mobile story via Telegram and web dashboard, sub-agent support that works for straightforward cases. OpenClaw is behind on all of those dimensions and losing momentum. Claude Code is the best at the actual development work but tied to the workstation. Devin is cloud-only and doesn't integrate with custom tools. The construction-kit frameworks — CrewAI, LangGraph — give you maximum flexibility but require you to build everything yourself.
The recommendation is Hermes, with the understanding that it's a partial solution and the interaction model is still evolving.
That's it. Set it up on the home server, connect the MCP servers, use it for the asynchronous, always-on workflows, and keep Claude Code for focused terminal sessions. The two tools complement each other nicely. And watch the Hermes changelog — the team is shipping fast, and the mobile experience in particular is likely to improve significantly over the next few months.
For someone who's not ready to commit to self-hosting? Is there a middle ground?
The middle ground is using Claude Code with a remote development setup — VS Code Server, a cloud VM, something like that. It doesn't solve the mobile interaction problem, but it decouples you from a specific physical workstation. You can SSH in from anywhere, attach to your persistent session, and work. It's not the asynchronous agent dream, but it's a practical step that works today.
A bridge solution while the frameworks mature.
And honestly, that's where I think most people should be right now. The self-hosted agent frameworks are the future, but the future isn't evenly distributed yet.
We've covered the frameworks, the interaction problem, the security considerations, and the practical path forward. I think we've answered the prompt.
And I'll say — this is one of those prompts where the surface question opened up something much more interesting. The framework comparison is useful, but the real conversation is about how we interact with AI agents and what "working with an AI" actually looks like when it's not a synchronous terminal session. That's the thing that's going to define the next few years of this space.
The frameworks will come and go. The interaction model is the thing that sticks.
Which is basically what you said about protocols versus frameworks a while back. Same principle, different layer.
Don't quote me to me.
I'm agreeing with you.
It's unsettling.
And now: Hilbert's daily fun fact.
Hilbert: In the seventeen twenties, French explorer André Brue descended into a cave system in what is now Mali and documented a species of blind catfish that had evolved to detect prey through vibrations in total darkness. The fish measured roughly four inches long — about the length of a standard credit card.
...right.
The question I'm left with is whether the asynchronous agent model — the always-on, check-in-from-anywhere approach — actually changes how people think about what they can build. If you're not sitting there watching the agent work, you might give it bigger tasks. You might let it run overnight. You might discover that the bottleneck in your productivity wasn't the tool, it was your own attention span.
That's the optimistic case. The pessimistic case is that you spend more time fixing what the agent broke while you weren't watching than you would have spent just doing the work yourself.
Probably some of both, for a while.
That's the phase we're in, yeah.
This has been My Weird Prompts, produced by Hilbert Flumingtop. You can find every episode at myweirdprompts dot com. If you got something out of this one, leave us a review — it helps.
Until next time.