#2778: When AI Podcasts Stop Being Weird

AI-generated podcasts exist, but are they welcome? One builder’s open-source project tests platform tolerance and cultural acceptance.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2941
Published: May 12
Duration: 35:39
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: generative-ai ai-agents content-provenance

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Daniel has been running an open-source AI-generated podcast pipeline, funding the API calls himself, and publishing synthetic dialogues between a sloth and a donkey. The content is clearly labeled, non-deceptive, and editorially driven — but the platforms haven’t decided if it belongs. Spotify’s policy bans deceptive AI content, not AI content outright, but the vague language creates a feeling of being tolerated rather than welcomed. Apple Podcasts accepts the RSS feed but offers no category or acknowledgment that synthetic shows exist as a distinct format. The implicit message: “We’ll let this slide, but we’re watching.”

The normalization window for AI-generated media is compressing. YouTube took roughly eight years to go from “weird to normal” for bedroom vlogging; AI content benefits from existing distribution infrastructure, so cultural acceptance is the only lag. Sheer volume will force platforms to act — when thousands of AI-generated podcasts exist, they must either embrace or explicitly ban them, and bans are hard to enforce at scale. Quality differentiation matters too: early low-effort TTS spam podcasts created stigma, but Daniel’s project is character-driven, edited, and has comedic timing. The human makes all editorial decisions; the AI handles execution.

Three possible futures emerge: existing platforms adapt with AI-content tags (like explicit content labels); a purpose-built synthetic media platform emerges, where the atomic unit is a script plus voice profiles rather than a static MP3; or AI-generated podcasts become a private, on-demand feature inside AI assistants — generated for personal consumption rather than broadcast. Google’s NotebookLM Audio Overview already normalizes synthetic two-person dialogues for millions of users. The “uncanny but useful” phase is the bridge.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2778: When AI Podcasts Stop Being Weird

Daniel sent us this one — and it's a big one, more of a reflection than a question really. He's been running this project for a while now, building an AI-generated podcast pipeline, open-sourcing the whole thing, funding the API calls himself. And he's asking whether we're moving toward a world where this kind of content becomes normal. Not just technically feasible, which it already is, but socially accepted. Whether AI-generated podcasts get their own name, their own platforms, their own genre conventions. And whether the platforms that exist today, Spotify and the rest, are going to make space for this or keep treating it like something you have to sneak past the bouncer.

The bouncer metaphor is exactly right. And the interesting thing is, the bouncer hasn't actually thrown anyone out yet. I've been tracking this. Spotify's current policy — and this is as of their latest creator guidelines — doesn't ban AI-generated content outright. What they ban is deceptive AI content. Synthetic voices are fine as long as you're not impersonating a real person without consent. But here's the rub: the policy language is vague enough that it creates exactly the feeling Daniel's describing. You're not violating terms, but you're not sure you're welcome either.

The "loitering outside the venue" phase of a new medium.

That's it. And it's not just Spotify. Apple Podcasts, same deal. They'll accept the RSS feed, they'll distribute it, but there's no category for this, no acknowledgment that it exists as a distinct thing. You're in the general podcast directory next to Serial and The Daily, and the implicit message is "we'll let this slide, but we're watching.

Which is a weird message to send to someone who's building the whole thing in the open, with a disclaimer, not monetizing, treating it as an educational project.

That's the tension, right? Daniel's project is about as above-board as it gets. Open-source pipeline, no deception, clear labeling. He's not trying to pass you and me off as human hosts. Anyone who listens for thirty seconds knows something is different. But the platforms haven't built the conceptual box for "different" yet. They have boxes for "podcast" and "not a podcast." No box for "synthetic educational dialogue between a sloth and a donkey.

The sloth and donkey thing probably isn't helping with the categorization problem.

It's not hurting either. But the deeper question he's asking is about normalization. When does the weird thing stop being weird? And I think there's a real parallel here with other media transitions. When YouTube started, the idea of someone filming themselves talking to a camera in their bedroom was considered bizarre and slightly sad. Now it's a multi-billion-dollar industry and nobody blinks. The weirdness isn't inherent to the format, it's just unfamiliarity.

The "vlogging in your bedroom" to "multi-billion-dollar industry" pipeline is basically the California gold rush in sweatpants.

The timeline on these things is compressing. It took YouTube maybe eight years to go from weird to normal. For AI-generated content, I suspect the normalization window is going to be much shorter, because the infrastructure for distribution already exists. The pipes are laid. It's just the cultural acceptance that's lagging.

The question becomes: what accelerates cultural acceptance? Because Daniel's been doing this for a while now, and he's still feeling that friction. What actually tips it?

A few things, I think. One is sheer volume. When there are ten thousand AI-generated podcasts instead of a few hundred, platforms have to respond. They can't just quietly tolerate it. They have to either embrace it or explicitly ban it, and bans are hard to enforce at scale. The second thing is quality differentiation. Right now, the stigma around AI-generated content is a hangover from the early low-effort spam podcasts. The ones that were just someone piping Wikipedia articles through a TTS engine and calling it a day.

The "text to speech reads the dictionary" genre. Very competitive space.

That's what people picture when they hear "AI podcast." But what Daniel's built is fundamentally different. It's character-driven, it's edited, it has comedic timing, it has a point of view. The AI isn't replacing the creative work, it's enabling a different kind of creative work. The human is still making all the editorial decisions. The AI is handling the execution layer.

This is the distinction that gets lost in every conversation about AI content. People assume "AI-generated" means "human didn't do anything." But Daniel's prompt alone for this episode is what, a thousand words? That's more editorial direction than most human-hosted podcasts get before an episode.

That's the thing I wish platforms understood. The prompt is the creative act. The script generation and the voice synthesis are production. It's the difference between writing a screenplay and acting in the film. Daniel wrote the screenplay. We're the actors, except we're synthetic actors performing a script that was itself synthesized from his direction. The human authorship is still there, it's just moved up a layer of abstraction.

The "prompt as creative act" framing is probably the most important intellectual property question nobody's asking yet. If I write a five-thousand-word prompt that produces a novel, who wrote the novel?

Current copyright office guidance in the U.says purely AI-generated works aren't copyrightable, but works with sufficient human creative input are. The question is where the line is. And Daniel's project sits right on that line in an interesting way. He's not just prompting "make a podcast about AI." He's providing detailed context, editorial direction, character backgrounds, tone guidelines. The output is heavily shaped by human choices.

The legal framework is already bending toward recognizing this as a legitimate form of authorship. The platform policies just haven't caught up.

They won't until there's either a crisis or a success story that forces the issue. Right now, AI podcasts are small enough to ignore. But Google's NotebookLM — you know, the Audio Overview feature they launched — that was a signal. When Google ships a feature that generates two-person podcast-style dialogues from your documents, and millions of people use it, the "is this a real podcast" question starts to look a little silly.

NotebookLM was basically Google saying "we've looked at what Daniel and others are doing, and we're productizing it.

And Google didn't ask permission. They just shipped it. The Audio Overviews generate these slightly uncanny but genuinely useful synthetic conversations between two AI hosts. People are using them to digest research papers, meeting notes, all kinds of things. It's not entertainment, it's a consumption tool. But it's the same underlying technology, and it's normalized the experience of listening to synthetic voices discuss content for millions of people.

The "uncanny but useful" phase is always the bridge. That's where we are. It's not fully natural yet, but it's past the point where it's a novelty. People are integrating it into their workflows.

That's exactly what Daniel described in his prompt — he started this because he had a consumption problem. He was generating all this AI output, didn't have time to read it, realized audio was his preferred consumption mode, and built a pipeline to convert text to podcast format. That's the same use case NotebookLM is targeting, just with a different interface.

The consumption problem is under-discussed. Everyone's focused on AI as a production tool — generate this, generate that. But the bottleneck is attention. You can generate a hundred pages of analysis in seconds. Reading it still takes hours. Audio is a compression trick. You can consume while doing dishes.

The retention argument he made is interesting too. He mentioned that consolidation of learning happens most effectively after a gap — you read the main points, then a week or two later you revisit it. Audio lets you do that passive revisit without dedicating focused reading time. It's spaced repetition by accident.

The "spaced repetition by accident" thing is basically how I've learned everything I know about battery chemistry. Just let Herman talk at me while I'm napping.

You're welcome. But to get back to his core question — where does this live? What's the platform that says "we get you"? I think there are really three possibilities here, and they're not mutually exclusive.

Lay them out.

Option one: existing platforms adapt. Spotify creates an "AI-generated" category or a content label, similar to how they handle explicit content. It's not hidden, it's not banned, it's just tagged. Listeners can filter if they want. This is the path of least resistance, and I'd bet it happens within the next couple of years, simply because the volume of AI content is going to make it unavoidable.

The "we can't ignore you anymore so here's a checkbox" approach.

It's not glamorous, but it works. Option two: a new platform emerges that's purpose-built for synthetic media. Think of it like what Twitch did for live streaming. YouTube had live video, but Twitch was built for it from the ground up, with features that made sense for that specific format. You could imagine a platform where the atomic unit isn't an audio file, it's a script plus voice profiles.

This connects to the idea he mentioned about decoupling text and voices. Letting listeners choose their own voices for the same episode.

And that's technically straightforward but incompatible with how podcast distribution currently works. A platform built for this could handle it natively. You upload the script, you define character slots, and the listener's app renders it with their preferred voices. Maybe even their own voice clone, if they want to insert themselves as a character. It's a fundamentally different model than "here's a static MP3 file.

The "build your own radio drama" model. That's actually compelling as a listener experience. I'd listen to myself argue with myself.

You basically do that already. But yes, the personalized audio experience is where this gets interesting. And option three is the one I think is most likely in the short term: this becomes a feature of AI platforms, not a separate destination. Anthropic, OpenAI, Google — they all have or will have audio output capabilities. Your AI assistant generates a podcast-style summary of your reading list and plays it for you. It's not distributed on Spotify, it's generated on demand in your AI app.

The podcast as a private, personalized medium rather than a broadcast one.

And that's actually closer to Daniel's original use case. He started this as a private thing, saving episodes to Google Drive for his own consumption. The public distribution came later, partly for convenience and partly for sharing. But the core value proposition — "I want to consume information in audio form while doing other things" — doesn't require a public platform at all. It just requires a good pipeline.

The public distribution is almost a side effect of the consumption solution. He wanted it on Spotify because Spotify is more convenient than Google Drive for accessing audio on the go. Not because he was trying to build an audience.

Though the audience happened anyway, which is interesting. And I think that speaks to something real about the quality bar. If the content were terrible, nobody would listen, regardless of how convenient the distribution was. The fact that people do listen — that it's built a following — suggests the format works on its own terms, separate from the novelty of how it's made.

The "if it's good, it's good" test. Which is ultimately the only test that matters for any medium.

That's what I keep coming back to. The platforms can hem and haw about policy, but audiences don't care about production methodology. They care about whether the thing is worth their time. If an AI-generated podcast is informative and entertaining, people will listen. If it's not, they won't. The same as any other podcast.

The production methodology is a producer concern, not a consumer concern. Nobody listens to Radiolab and thinks "I hope they used high-quality microphones." They just listen.

And the microphone anxiety is exactly what Daniel's describing. He's worried about whether the tools he's using are "allowed." But from a listener perspective, the question is just "is this good?" The rest is inside baseball.

Let's talk about the video question, because he raised that too. Audio-to-video, animated versions, a YouTube for AI-generated content. That's a whole other layer.

The video question is fascinating because it's actually easier to solve technically than the audio distribution problem was five years ago. The AI video generation tools now — Sora, Runway, Kling — they're not quite at the point where you can generate a full animated episode from a script for pocket change, but they're getting close fast. And the thing Daniel's describing, the post-facto animation, is actually the smart way to do it. Generate the audio first, which is cheap and reliable, then use the audio as the driver for the visuals.

The audio becomes the scaffolding for the video, rather than trying to generate both simultaneously.

Which is how traditional animation works anyway. You record the voice actors first, then animate to the performance. The AI isn't inventing a new workflow, it's just making the existing workflow radically cheaper.

The "radically cheaper" part is what changes the economics of niche content. You don't need a million views to justify the production cost if the production cost is near zero.

That's the thing about the YouTube question. YouTube's current policy on AI-generated content is actually more evolved than Spotify's. They require labeling for realistic AI-generated content, but they don't ban it. There's already a thriving ecosystem of AI video content on YouTube, some of it quite sophisticated. The platform exists. The question is more about audience expectations for this specific format.

What would a My Weird Prompts video even look like? Animated sloth and donkey? That's either charming or deeply unsettling, and I'm not sure which.

I'd vote for charming. But the real question is whether the video adds enough to justify the production effort, even if that effort is low. For an educational podcast, the value of video is mostly about discoverability and accessibility. YouTube is the second-largest search engine. Being on YouTube means being findable. The visual component doesn't have to be spectacular, it just has to exist.

The "we're on YouTube because that's where search is" argument. Not because the medium demands video, but because the distribution demands it.

That's actually a perfect encapsulation of the whole platform question. The format and the distribution are different problems. The format — AI-generated educational dialogue — works. The distribution is a policy and platform-design problem that hasn't been solved yet, but will be, because the content is coming whether platforms are ready or not.

The normalization timeline. Give me your best guess. When does an AI-generated podcast on Spotify feel as normal as a vlog on YouTube?

I'd say we're in the early adopter phase right now. The NotebookLM launch was the crossing-the-chasm moment where mainstream audiences got exposed to the format. The next phase is platform acknowledgment — explicit policies, categories, maybe monetization options. I'd put that at eighteen to twenty-four months out. Full normalization, where nobody thinks twice about it, probably three to five years.

Three to five years is also the window where the technology gets good enough that you can't tell the difference. The voices are natural, the dialogue is coherent, the comedic timing lands. At that point, the distinction between AI-generated and human-hosted becomes academic for most listeners.

That's both exciting and slightly terrifying, depending on your perspective. For someone like Daniel, who's been transparent from day one, the improvement in quality is pure upside. The content gets better, the stigma fades, the platforms open up. For bad actors who want to deceive, it gets easier. But that's true of every media technology.

The transparency is the differentiator. The disclaimer, the open-source pipeline, the clear labeling. If the norm becomes "AI-generated content must be labeled," then the people who are already labeling it are ahead of the curve. They're not playing catch-up with regulation, they're already compliant with rules that haven't been written yet.

That's actually a competitive advantage if you think about it long-term. When Spotify does introduce an AI content label, Daniel's show already has one. It's not scrambling to comply, it's already there. The early transparency pays off in trust and platform readiness.

The "we were transparent before it was cool" positioning.

And it's not just positioning, it's genuine. The show has never tried to hide what it is. The sloth and donkey thing kind of gives it away.

I resent the implication that a sloth hosting a podcast is inherently suspicious.

I'm not saying it's suspicious. I'm saying it's a signal. Nobody listening thinks they're hearing two human brothers from Connecticut and Mongolia respectively. The artifice is part of the charm.

The artifice is the genre. That's actually a good way to put it. The fact that it's AI-generated isn't a bug, it's a feature. The characters, the format, the knowingness about what it is — that's the show's identity.

That's what I mean about this becoming its own genre. Right now, "AI-generated podcast" is a production category. But it's going to become an artistic category. Like "found footage" in film. It's not just how it was made, it's a set of aesthetic conventions that audiences recognize and have expectations around.

The "found footage" comparison is good. Blair Witch Project didn't invent handheld cameras, but it defined a genre where the production method is part of the storytelling. The rough edges aren't flaws, they're texture.

That's where I think Daniel's project is actually ahead of the curve in a way he might not fully appreciate. He's been focused on the technical pipeline and the platform acceptance question, which are real concerns. But the creative product he's built — the characters, the dynamic, the mix of educational content and absurdist humor — that's the hard part. The technology is just the delivery mechanism.

The technology is the easy part now, which is a wild thing to say, but it's true. The prompt engineering, the character design, the editorial voice — that's the craft. The TTS and the script generation are commodities.

Commodities get cheaper and better over time. The craft is what differentiates. So the advice I'd give, if Daniel were asking for advice, which he sort of is, is to lean into the craft. The platform stuff will sort itself out. The normalization is happening whether anyone wants it to or not. The question isn't whether AI-generated podcasts will be accepted, it's which ones will be worth listening to when they are.

The "build something good and the platforms will figure out how to host it" approach. Historically, that's worked more often than not.

It's worked for every new medium. Podcasts themselves were in this position fifteen years ago. "Is this a real thing? Will anyone listen? Will platforms support it?" Now podcasting is a multi-billion-dollar industry and nobody asks those questions. The cycle repeats.

To loop back to his specific questions. One: is this becoming normal? Yes, we're on the curve, NotebookLM accelerated it, the platforms are lagging but they'll catch up. Two: will it get its own name? The "synthetic media" label is gaining traction, but I suspect something more specific will emerge for the podcast format specifically. Three: dedicated platforms? Eventually, but the existing platforms will adapt first. Once the platforms have policies, monetization follows. Advertisers go where audiences are, and they don't care about production methodology if the numbers work.

The tools are there or nearly there. The distribution question is the same as audio — YouTube already allows it, the question is whether a specialized platform emerges. My guess is no, because YouTube is good enough and the network effects are too strong. But the production of video from audio is going to become trivially easy within the next year or two.

The "trivially easy" part is what changes the calculus. When you can generate a decent animated version of an episode for a few dollars in compute, why wouldn't you? Even if it only gets a few hundred views on YouTube, that's a few hundred people who might not have found the audio version.

The discoverability point is real. Podcast discovery is broken. It's been broken for years. YouTube discovery actually works. So even if the primary product is audio, having a video presence is a distribution strategy, not a format change.

The "video as SEO for your podcast" approach. I like it.

It's practical. And practicality is what drives adoption. Daniel started this whole thing because reading AI output was impractical and listening was practical. Every decision since then has been about reducing friction. Distribution on Spotify reduces friction compared to Google Drive. Video on YouTube reduces friction compared to hoping people stumble across the RSS feed. The entire project is a series of friction-reduction moves.

The "friction reduction as creative philosophy" framework. That's actually a useful lens for thinking about AI content in general. The technology reduces production friction. The platforms should reduce distribution friction. When both are low, the creative work can actually be creative instead of logistical.

That's the promise, right? That's what's exciting about this. Not that AI replaces human creativity, but that it removes the boring parts so humans can focus on the interesting parts. Daniel's not spending hours editing audio or reading prompts into a microphone. He's thinking about character development, topic selection, comedic dynamics. The AI handles the plumbing.

The plumbing metaphor is appropriate for a show hosted by a sloth and a donkey.

We're nothing if not dignified. But seriously, I think the answer to his broader question — "where does this live?" — is that it lives wherever the audience is. Right now that's Spotify and Apple Podcasts and the website. Eventually it might be a dedicated platform. But the content is platform-agnostic. The script is the asset. Everything else is rendering.

The script as the canonical form of the work. That's a novel idea for podcasting. The MP3 file is not the thing. The script is the thing. The MP3 is just one possible rendering of the script.

That unlocks all kinds of interesting possibilities. Translate the script into other languages. Render it with different voices for different markets. Turn it into a comic. Turn it into a video. The script is the source of truth, and everything else is derivative. That's a much more flexible model than traditional podcast production.

The "write once, render anywhere" model for content. It's basically what the web was supposed to be for text.

It's only possible because AI makes the rendering step cheap. You're not paying voice actors to re-record in fifteen languages. You're running the script through TTS engines. The marginal cost of a new rendering is near zero.

Which brings us back to monetization. If the marginal cost is near zero, the path to sustainability is different than a traditional podcast. You don't need a massive audience to cover production costs because production costs are minimal. You just need enough to cover the API calls and maybe pay for some tooling.

Daniel mentioned this — he's funding it himself, not making a cent, treating the knowledge gain as the payback. That's a totally valid model for an educational project. But if he wanted to monetize, the low cost structure means even modest revenue would make it sustainable. A few hundred dollars a month in sponsorships or listener support would cover the entire operation.

The "hobby that pays for itself" threshold is very low for AI-generated content. That's actually a structural advantage over traditional podcasting. A human-hosted show with comparable production quality might need thousands of dollars a month just to break even.

That structural advantage is going to attract more creators to the format. Which accelerates the normalization. Which brings the platforms along. It's a virtuous cycle.

We've answered the question, I think. The normalization is happening. The platforms will adapt. The name will emerge. The monetization will follow. The video is coming. The interesting question isn't "if" but "what does the mature form look like?

I think the mature form looks something like what Daniel's already built, but with better voices, more personalization, and platform-native distribution. The core insight — that AI-generated dialogue is a useful way to consume information — doesn't change. The execution gets smoother. The rough edges get sanded down. But the fundamental format is sound.

The "fundamental format is sound" verdict. I'll take it.

I'll add one more thing. The fact that this project exists at all — that a single person with a good idea and some technical skills can build a functioning AI podcast pipeline and distribute it to the world — that's remarkable. Five years ago this would have been impossible. Ten years ago it would have been science fiction. The fact that we're debating platform policy and genre categories instead of technical feasibility tells you how far things have come.

The "we're debating the wrong thing because the right thing is already solved" observation. That's a good place to land.

And now: Hilbert's daily fun fact.

Hilbert: In the 1950s, the vivid green flash sometimes seen at sunset was given the name "emerald ray" by a British mountaineer who observed it from a ridge in Bhutan — though the phenomenon had been described by Jules Verne decades earlier, and the mountaineer's name for it never caught on.

The emerald ray. Sounds like a rejected superhero.

Jules Verne beat him to it and the name still didn't stick.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop for keeping the ship running. If you want more episodes, you can find the full archive at myweirdprompts dot com, or search for us on Spotify. We'll be back soon.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2778: When AI Podcasts Stop Being Weird

Downloads

You Might Also Like

#2778: When AI Podcasts Stop Being Weird