So Daniel sent us this one, and I have to say it lands close to home. He's got three or four speakers running on Raspberry Pis and Nano Pis, and he wants to play podcasts and audio libraries across multiple rooms. Simple enough request in theory. But he's been living with the Home Assistant ecosystem, specifically Snapcast and Music Assistant, and describes the experience as... brittle. Things break, integrations fall over, and the overhead of keeping it all running starts to feel like a second job. The question is whether there's a path to reliable casual multi-room audio that doesn't require Home Assistant as the glue holding everything together. And also, can any of these solutions actually serve as a unified playback source for something like Kodi or Plex?
The framing Daniel uses is exactly the right one to start with, actually. There's a real tension in how people talk about multi-room audio, because the audiophile use case and the casual use case have almost nothing in common beyond "sound comes out of more than one speaker." Sonos built a business on the audiophile end. Tight sync, high quality, closed ecosystem, everything just works because they control every variable. But if you're running Raspberry Pis and Nano Pis and you want to listen to a podcast in the kitchen and the bathroom without losing your mind, that's a completely different problem.
And the solutions that exist kind of collapse those two use cases into one design space, which I think is where the frustration comes from. Snapcast, for instance, is technically elegant. The synchronization is genuinely impressive. But it's a server-client architecture that assumes you're going to babysit the plumbing.
Snapcast is interesting because what it actually does well is the sync problem. It timestamps audio chunks and the clients compensate for network jitter, so when it's working you can have sub-millisecond sync across rooms. That part is solved. The brittleness is not in the sync engine, it's in everything around it. How you get audio into Snapcast, how clients reconnect after a reboot, how Music Assistant talks to it. Those integration surfaces are where things fall apart.
So the failure mode isn't the core technology, it's the connective tissue between components.
Which is a general problem with the Home Assistant approach to anything, really. Home Assistant is an orchestration layer, and it's fantastic at that, but it's also a dependency chain. Every integration is a potential break point, and audio is particularly sensitive because it's stateful. If your temperature sensor integration breaks, your thermostat stops updating. Annoying. If your audio integration breaks mid-episode of a podcast, you notice immediately.
By the way, today's episode is being generated by Claude Sonnet four point six, which I mention because it feels appropriate to acknowledge that an AI wrote the script for an episode about making technology actually work reliably. Anyway. So if we're moving beyond Home Assistant for this, what does the landscape look like?
There are really three or four different directions you can go, and they have very different tradeoffs. The first is dedicated audio OS images for Raspberry Pi. Volumio and Moode Audio are the two most mature options here. They're Linux distributions that boot straight into an audio player, handle their own network services, have a web interface, and are designed specifically for this use case.
I've heard Volumio come up a lot. What's the actual experience like for someone who just wants to play a podcast?
Volumio has a plugin architecture, and there's a podcast plugin that handles RSS feeds reasonably well. The web interface is clean. You can set up multiple Volumio instances, and they have a feature called Volumio multiroom that lets you group them. The free tier has some limitations, but for basic playback across a few devices it's functional. The catch is that their multiroom sync is not as tight as Snapcast. For podcasts and spoken word it doesn't matter at all. For music with a strong beat you might notice it across rooms.
And for Daniel's actual use case, podcast playback and audio libraries, the sync tolerance is much wider.
Much wider. Human speech, you can be a couple hundred milliseconds off and it's fine. Music you start noticing at around thirty milliseconds depending on the listener. So for the specific problem Daniel has, the sync requirements are actually relaxed enough that you have more options.
What about Moode?
Moode is a bit more technically focused. It's based on MPD, which is Music Player Daemon, and MPD has been around since the early two thousands. Very stable. The community around it is serious. Moode adds a clean web interface on top of MPD and handles a lot of the configuration for you. For multi-room specifically, Moode has snapserver integration built in, which is interesting because you get the Snapcast sync engine but wrapped in something that's a bit easier to manage.
So you're still using Snapcast under the hood, but without Home Assistant in the middle.
Without Home Assistant, and without Music Assistant, which removes two integration layers. The question is whether removing those layers actually fixes Daniel's brittleness problem or just moves it somewhere else. My suspicion is it helps significantly. Music Assistant in particular has had a rocky development history. It's ambitious, it's trying to do a lot, and the result is that its integrations with streaming services and local libraries can be flaky. If you're using it as the source feeding Snapcast, and Music Assistant hiccups, the whole chain goes down.
So the Home Assistant stack for audio is basically a chain of ambitious projects each of which is trying to be more than it is.
That's a fair characterization. And the alternative is to use tools that do one thing well. MPD does playback and library management well. Snapcast does sync well. The question is what does the interface layer look like, because MPD clients vary wildly in quality.
Can you actually subscribe to podcast RSS feeds through MPD?
Not natively. MPD is fundamentally a music library player. It understands files and streams, so you can point it at a podcast stream URL, but it doesn't have feed management, episode tracking, episode marking as played, any of that. For podcasts specifically you need something else in the chain.
And this is where I think the problem gets genuinely interesting, because Daniel's use case is a hybrid. He wants podcast RSS feeds, which are episodic and stateful, and he wants audio libraries, which are more like a traditional music collection. Those are actually different data models.
They are. And there's a piece of software called gPodder that's been around for a long time, handles RSS feed subscriptions and episode downloads, and can feed into MPD. But that's another component, another thing to configure, another potential break point. The honest answer is that podcast playback in multi-room audio is kind of an underserved use case. The audiophile software ecosystem assumes you're playing FLAC files from a local library. The podcast ecosystem assumes you're on a phone with a dedicated app.
So where does that leave Daniel?
I think there are two realistic paths that don't involve Home Assistant. The first is Volumio with the podcast plugin, accepting that the multiroom sync won't be perfect but will be good enough for speech. Simple to set up, decent interface, one system to manage. The second is a slightly more involved but more powerful setup using a single Snapcast server on one of the Pis, feeding clients on the others, with something like Mopidy as the source rather than MPD.
Mopidy. Tell me about Mopidy.
Mopidy is a music server that's compatible with MPD clients but has a plugin ecosystem that MPD doesn't. There's a Mopidy podcast plugin, there are plugins for various streaming services, there's a plugin for local libraries. The key advantage is that you get the flexibility of plugins for different audio sources, while still outputting to Snapcast for the sync. And because Mopidy and Snapcast are both just services running on Linux, they're not dependent on Home Assistant staying healthy.
So the architecture would be: Mopidy handles the sources, Snapcast handles the distribution, and you have a web interface for control.
And for the web interface, Iris is a really nice Mopidy frontend. It has a clean material design interface, works well on mobile, handles grouping to some extent. Or you can use one of the MPD-compatible clients since Mopidy speaks that protocol.
How does this actually handle the multi-room piece? If I want to play a podcast in the kitchen but not the bedroom, is that manageable without a lot of fuss?
With Snapcast, you can control which clients are in which stream. There's a web interface for Snapcast server, and there are also mobile apps. The Snapdroid app on Android, for example, lets you control which speakers are active. It's not as slick as the Sonos app, but it works. The caveat is that the initial setup requires some command line work. You're editing config files, setting up systemd services. Once it's running it tends to stay running, but getting there has a learning curve.
And this is the fundamental tension, isn't it. The thing that makes these tools reliable is also what makes them harder to set up. They're not trying to be consumer appliances.
There's actually a middle path worth mentioning, which is that some of these Raspberry Pi audio OS images are moving toward making this easier. Volumio in particular has improved their setup flow considerably. You can be up and running with a basic configuration in under an hour without touching the command line. For Daniel's situation with multiple devices, you'd still need to do some configuration per device, but it's getting better.
What about the Nano Pi specifically? Because that's a slightly different hardware profile than the Raspberry Pi.
The Nano Pi, depending on which model, runs Armbian or FriendlyElec's own OS images. Volumio and Moode are built for Raspberry Pi specifically. Mopidy and Snapcast, because they're just Debian packages, will run on anything that runs a Debian-based ARM Linux. So the Mopidy plus Snapcast approach actually has better cross-platform compatibility for a mixed hardware situation like Daniel's.
So if you've got Raspberry Pis and Nano Pis in the same setup, the Mopidy and Snapcast route is probably more practical than trying to get Volumio running on the Nano Pi.
The Snapcast clients specifically are very lightweight. A Snapcast client is basically just receiving audio and playing it out. You could run a Snapcast client on almost anything with a Linux kernel and an audio output. The server, where Mopidy is running, wants a bit more resources, but even a Raspberry Pi three handles it fine.
Let's talk about the Kodi and Plex angle, because Daniel specifically asks whether multi-room audio can work as a unified playback source for media centers. This feels like a different problem.
It's a genuinely different problem. Kodi and Plex are video-first media centers that also handle audio. Their audio playback is designed around a single output, the device running Kodi or Plex. Getting them to send audio to a multi-room system is possible but requires some architectural choices.
What are the options?
The cleanest approach is to treat the multi-room system as an audio output device that Kodi or Plex can see. If you're running Snapcast with a virtual audio sink, you can use something like PulseAudio to create a virtual device that Kodi thinks is a local sound card but is actually feeding into Snapcast. There's a PulseAudio Snapcast module that does this. Kodi plays audio, it goes to the virtual sink, Snapcast distributes it.
That's elegant in theory. How does it work in practice?
PulseAudio integration with Snapcast works, but PulseAudio itself can be a source of instability on embedded hardware. It's gotten better, but it's known for occasionally having issues with latency and dropouts on Raspberry Pi. There's also PipeWire now, which is the modern replacement for PulseAudio, and the PipeWire story for this kind of integration is improving but still not fully mature on embedded Linux.
So you're potentially trading Home Assistant brittleness for PulseAudio brittleness.
Possibly. The alternative is to not try to integrate Kodi into the multi-room system at all, and instead use the multi-room system for audio-only content, podcasts and music, and let Kodi handle its own audio when you're watching video. Those are actually pretty separate use cases in practice.
That's a clean separation. You're not trying to make one system do everything.
And I think that's the right mindset for casual users. The audiophile dream is one unified system that handles everything perfectly. The practical reality for someone with a mixed bag of Raspberry Pis and Nano Pis is that you want reliable systems for distinct use cases, not a grand unified architecture that's fragile at every seam.
There's something almost philosophical about that. The audiophile wants perfection at the cost of complexity. The casual user wants reliability at the cost of completeness.
And the Home Assistant approach tries to give you both by wrapping everything in automation and integration, but that wrapper has its own failure modes. Every abstraction layer you add is something that can break.
So let's get concrete about what Daniel should actually do. If you were setting this up tomorrow, what would you do?
I'd start with a single server node, probably the most capable Raspberry Pi in the setup. Install Mopidy, configure the podcast plugin and point it at the RSS feeds Daniel wants, configure the local library plugin for his audio collection. Install Snapcast server on the same machine. Configure Mopidy to output to the Snapcast pipe. Then on every other device, just install Snapcast client. That's it for those devices. They receive audio and play it. No Mopidy, no library management, just the client.
And the interface?
Iris running on the server, accessible from any browser on the network. You'd bookmark it on your phone. From there you can browse podcasts, play episodes, control playback. For multi-room control, the Snapcast web interface or a mobile app handles which speakers are active.
How long does that actually take to set up?
For someone comfortable with a terminal, the server setup is probably two to three hours including configuration and testing. The clients are maybe fifteen minutes each. The ongoing maintenance is minimal because you're not running a complex orchestration layer. If a client reboots, it comes back and reconnects to the server automatically. If the server reboots, everything reconnects. That's a much more stable failure mode than the Home Assistant stack where you have multiple services that need to come up in the right order.
The reconnection behavior matters a lot for reliability. One of the things that makes Snapcast fragile in the Home Assistant context is that Home Assistant itself needs to be healthy for integrations to work. If you take Home Assistant out of the picture, you're down to services that have much simpler dependencies.
Systemd handles service startup and restart on modern Linux, and if you configure the Mopidy and Snapcast services to restart on failure, the system is reasonably self-healing. Not perfect, but much more so than a stack with five or six integration layers.
Is there a case for the fully managed appliance approach? Like, is there something you can just buy that does this?
The honest answer is Sonos is the thing you just buy that does this. And it's actually not that expensive if you're buying a few speakers. The trade-off is you're locked into their ecosystem, you're dependent on their servers for some features, and you're not using hardware you already have. For Daniel who already has the Pis, buying Sonos feels like admitting defeat. But for someone starting from scratch who wants reliable multi-room audio without the DIY overhead, Sonos still makes a lot of sense.
What about WiiM? I've seen those come up as a middle ground.
WiiM is interesting. They make network audio streamers that plug into existing speakers, and they have pretty solid multi-room support through their own app. The WiiM Home app handles grouping, it supports AirPlay two, Chromecast Audio, Tidal Connect, various protocols. The devices are around eighty to a hundred fifty dollars each. They're not as locked-in as Sonos because they support open protocols. But again, if you already have the Pis, you're essentially paying for hardware you don't need.
Let's talk about the RSS podcast angle specifically because I think that's undersold as a challenge. Most of the multi-room audio ecosystem is built around music. Podcasts are a different beast.
The core difference is that podcasts are episodic with state. You need to know which episodes you've listened to, where you are in an episode, and the feed itself is dynamic, new episodes appear. Music libraries are relatively static. The Mopidy podcast plugin handles feed subscriptions and episode listing, but it doesn't have the sophisticated state tracking of a proper podcast client. If you pause an episode and come back the next day, resuming from exactly where you left off is not guaranteed.
That's a real limitation.
It is. There's a workaround that some people use, which is to run a podcast aggregator separately, something like Podgrab or Audiobookshelf, which handles feed subscriptions and downloads episodes to a local library. Then Mopidy or MPD treats those downloaded files as part of the local library. You get proper episode management from the aggregator and reliable playback from the audio server.
Audiobookshelf is interesting because it also handles audiobooks, which might be relevant for Daniel.
Audiobookshelf has become really solid. It runs as a Docker container or a native install, has a clean web interface, handles podcasts and audiobooks, tracks playback position, syncs across devices. If you run Audiobookshelf as the podcast management layer and point Mopidy at the download directory, you get the best of both. Audiobookshelf handles the feed logic, Mopidy and Snapcast handle the multi-room playback.
So the full stack would be Audiobookshelf for podcast management, Mopidy for library playback and source management, Snapcast for distribution, and Iris for the playback interface.
That's four components, which sounds like a lot, but each one is doing a distinct job and they interact through simple interfaces. Audiobookshelf writes files to a directory. Mopidy reads from that directory. Mopidy writes audio to a pipe. Snapcast reads from the pipe. The interfaces are file system and audio pipe, not complex APIs that break when versions change.
There's something satisfying about that architecture. It's Unix-ish. Each component does one thing, they compose.
And the failure modes are isolated. If Audiobookshelf has a problem, your existing downloaded episodes still play. If Mopidy has a problem, Audiobookshelf still manages your feeds. The components don't cascade failures into each other the way tightly coupled systems do.
Let's think about the practical takeaways for someone in Daniel's situation. He's got working hardware, he's got frustration with the current setup, he wants to know what to actually do.
The first thing I'd say is: don't try to fix the Home Assistant stack. The brittleness he's experiencing is structural. Music Assistant is doing a lot of work to integrate many sources, and that complexity is the source of the fragility. Adding more configuration or updating versions might help temporarily but won't solve the underlying architecture problem.
Cut your losses and start fresh.
The second thing is: decide whether you want a single unified interface or are happy with two interfaces for two use cases. If you want one interface for everything including podcasts and music, the Mopidy plus Audiobookshelf approach gives you that, though it requires some setup. If you're happy using Audiobookshelf's own interface for podcast management and a separate interface for multi-room playback, you have more flexibility.
And the third thing?
Accept the setup cost. The reliable solutions require a few hours of configuration work up front. The payoff is a system that runs for months without needing attention. The Home Assistant approach promises you avoid that setup cost but actually just defers it into ongoing maintenance. The Mopidy and Snapcast approach has a higher upfront cost and much lower ongoing cost.
What about someone who wants to start even simpler? Is there a minimal version of this that gets you most of the benefit?
The absolute minimum that actually works reliably is Mopidy on one Pi with the local library plugin, no Snapcast at all, with a Bluetooth speaker or a connected speaker in the room you're in most of the time. Add Snapcast later when you want multi-room. The system is incrementally expandable.
And for the Nano Pi devices specifically?
Install Snapcast client from the Armbian repository, configure it to point at the server IP, and you're done. The Nano Pi doesn't need to run anything complex. It just receives audio and plays it. The resource requirements for a Snapcast client are minimal, we're talking about something that runs comfortably on a Nano Pi Zero.
One thing I want to push back on slightly, and I'm genuinely uncertain here, is whether the Mopidy project is still actively maintained. Because recommending software that's going stale is its own kind of reliability problem.
That's a fair concern and I'm glad you raised it. Mopidy's development has been slower in recent years. The core is stable and well-tested, but some of the plugins, including the podcast plugin, have had periods where they weren't actively maintained. Before committing to this stack, I'd check the GitHub repositories for recent activity. If the podcast plugin is showing commits from the last six months, you're probably fine. If the last commit was two years ago, you might want to look at alternatives or be prepared to handle some issues yourself.
What's the alternative if Mopidy's ecosystem is too stale?
MPD with the Beets music library manager for the local library, and a separate podcast solution feeding into the same Snapcast server. It's slightly less elegant because you're managing two playback sources, but MPD is extremely actively maintained and has been for over twenty years. It's not going anywhere.
That's actually reassuring in a way. The boring, old, stable tools are sometimes exactly what you want for infrastructure.
There's a reason sysadmins still use tools from the nineties. Not because they're hip, but because they work and they're understood. MPD is in that category. Snapcast is newer but has a focused scope and active development. Those are good signals for something you're going to rely on.
Is there anything in the newer generation of tools worth mentioning? Or is the right answer here to lean on the boring reliable stuff?
There's a project called Navidrome that's worth knowing about. It's a music server that speaks the Subsonic API, which means it has a huge ecosystem of clients. It's actively developed, has a good web interface, handles large libraries well. For music specifically, Navidrome plus a Subsonic client is a genuinely good experience. The gap is still podcasts. Navidrome doesn't handle podcast feeds.
So Navidrome for music, Audiobookshelf for podcasts, Snapcast for multi-room distribution. That's actually a pretty clean three-component system.
And both Navidrome and Audiobookshelf can output to Snapcast with some configuration. You'd have the Snapcast server running, and either source can pipe audio into it. The complexity is in the switching, if you're listening to music via Navidrome and want to switch to a podcast from Audiobookshelf, those are two different applications and you'd need to stop one before starting the other in the Snapcast pipe.
That's a real usability friction point.
It is, and it's the kind of thing that Home Assistant's Music Assistant was trying to solve by being the unified layer. The problem is that being the unified layer is hard and fragile. There's no perfect answer here. You either accept some usability friction in exchange for reliability, or you accept some fragility in exchange for a seamless interface.
For Daniel's actual use case, how often is he really switching between music and podcasts in the middle of a session?
Probably not that often. And if you establish a convention, podcasts in Audiobookshelf, music in Navidrome, and you only have one running at a time, the friction is manageable. It's not as slick as a unified interface, but it's predictable, which is underrated.
Predictability is underrated. I think that's actually the core insight of this whole conversation. The Home Assistant approach optimizes for the happy path, everything working seamlessly together. The alternative optimizes for predictable behavior including predictable failure modes.
And for a casual user who just wants to hear a podcast while making dinner, predictable is more valuable than seamless. If something breaks, you want to know exactly where it broke and how to fix it. With a simpler architecture, that's much more achievable.
Alright, let me try to land this. If I were Daniel, here's what I'd actually do. Step one: pick one Pi as the server, install Mopidy and Snapcast server. Step two: install Snapcast client on every other device. Step three: install Audiobookshelf for podcast management, point it at a download directory that Mopidy also reads from. Step four: use Iris as the main playback interface, Audiobookshelf's own interface for managing podcast subscriptions. Step five: stop trying to integrate this with Home Assistant and accept that multi-room audio is a separate system.
That's a solid plan. I'd add one thing: document your configuration. Write down what you did, what IP addresses you used, what config file settings you changed. Future you will be grateful when something needs to be rebuilt. It doesn't need to be fancy, even a text file in a shared folder.
The Raspberry Pi user's equivalent of commenting your code.
The number of times I've seen someone set up a perfectly working system, have a hardware failure six months later, and have no idea how to reconstruct it... document everything.
On the Kodi and Plex question, I think the honest answer is: don't try to route Kodi or Plex audio through the multi-room system for video content. For audio-only content, if you're using Kodi as a music player, you can configure Kodi to use a PulseAudio or PipeWire output that feeds into Snapcast, but it's an additional layer of complexity that may not be worth it. Keep Kodi for video, use the dedicated audio stack for audio.
That's where I land too. The integration is possible but the marginal benefit for a casual user doesn't justify the added complexity. Kodi has its own audio player that works fine for the device it's running on. Multi-room audio is a separate use case.
Good. I think we've covered the actual territory here. It's a case where the answer is genuinely "step back from the complex thing and use simpler tools that compose well." Which is a slightly boring answer but probably the right one.
The boring answer is often the right answer in infrastructure. The exciting multi-room audio setup is the one that breaks at dinner time. The boring one is the one you forget is running because it just works.
On that note. Thanks to Hilbert Flumingtop for producing, as always. And a quick word for Modal, who make the serverless GPU infrastructure that runs our pipeline. If you're doing anything compute-intensive, modal.com is worth a look. This has been My Weird Prompts. If you want to catch all two thousand one hundred forty-seven episodes, head to myweirdprompts.com. We're also on Telegram if you want to be part of the conversation.
Until next time.