#2334: How AI Flattens Your Voice in Emails

Why AI-generated emails feel impersonal and how to reclaim your authentic voice in professional communication.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2492
Published: Apr 19
Duration: 21:41
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Claude Sonnet 4.6
Topics: fine-tuning prompt-engineering ai-ethics

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

AI tools are increasingly integrated into professional writing workflows, but their output often lacks the personal touch that defines authentic communication. This raises a critical question: how can we use AI without losing our unique voice in emails and other personal writing?

The core issue lies in how AI models are trained. They optimize for clarity and coherence across vast datasets, converging on a "mean" style that is competent but generic. This works well for tasks like summarizing documents or drafting boilerplate text but falls short when the task is relational. When writing to someone who knows you, AI’s tendency to flatten idiosyncrasies creates a noticeable gap. Readers may not consciously identify the problem, but they often feel a subtle distance—like the email was written for anyone, not specifically for them.

Authentic communication relies on emotional texture and personal specificity. For example, a real email might reference a shared joke or use a tone that reflects the relationship’s history. AI, however, defaults to a homogenized corporate register that strips out these nuances. While surface-level mimicry—like emulating sentence structure or vocabulary—is possible, deeper aspects of voice, such as rhythm and emotional tone, remain elusive.

Technical solutions exist but require deliberate effort. Fine-tuning, particularly using techniques like LoRA (Low-Rank Adaptation), allows models to adapt to individual writing styles by adjusting weights without retraining the entire model. This approach can reduce tone inconsistencies by up to 60%, though it’s not a perfect solution. To be effective, fine-tuning requires a substantial corpus of personal writing—ideally 200-300 examples—where the writer’s authentic voice is evident.

Prompting strategies can also help, but they demand careful context feeding to guide the model toward specificity. The challenge is balancing convenience with authenticity, as AI’s default optimization often works against personalization.

Historically, similar concerns arose when typewriters replaced handwritten letters. Today, the debate continues, but with higher stakes: AI’s influence is invisible, making the authenticity gap harder to detect and address. The episode concludes by emphasizing that preserving personal voice in AI-assisted writing requires intentional intervention, whether through fine-tuning, advanced prompting, or other methods.

Mentions

Claude Sonnet AI model for writing and assistance
Every Publication on business and technology
Qwen3 Open-source language model from Alibaba

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2334: How AI Flattens Your Voice in Emails

Daniel sent us this one, and it's something I suspect a lot of listeners have bumped into personally. He's asking about AI and personal writing, specifically emails and communications where your actual voice matters, not just the information. The problem he's circling is this: AI is everywhere in writing workflows now, but the output often sounds like it was written by a committee that's never met you. So what do you actually do about that? Fine-tuning, custom tools, something else? And can you get there without becoming a machine learning engineer in the process?

The committee line is exactly right. There's a very specific texture to AI-generated prose that people have started to recognize almost instantly, and it's not just about vocabulary. It's about the rhythm, the emotional temperature, the way a real person's writing has these little idiosyncratic corners that AI just sands flat.

Flat is the word. I've read AI-drafted emails that were technically correct in every way and somehow communicated nothing about the person who supposedly sent them. Which, if you're writing a terms of service document, fine. If you're writing to a colleague you've worked with for six years, that's a problem.

The scale of this is genuinely significant now. AI-generated emails went up around forty percent in 2025, which means we're at a point where a substantial chunk of professional communication is being drafted or heavily shaped by these models. So the authenticity question isn't niche anymore. It's becoming the central question of how we communicate at work.

By the way, today's episode is powered by Claude Sonnet four point six.

Which, you know, is itself in the business of writing things. So there's a certain honesty to that.

A little on the nose, but we appreciate the transparency. Daniel's question about where this breaks down is a good one—let's dive into that.

The breakdown is actually pretty predictable once you see it. These models are trained to optimize for clarity and coherence across enormous amounts of text, which means they converge on the center of the distribution. They learn what writing looks like in aggregate, and what writing looks like in aggregate is... kind of nobody. Competent, inoffensive, readable. The opposite of distinctive.

By design, they're regressing toward the mean. Which is fine if the mean is what you want.

Right, and for a lot of tasks it is. Summarizing a document, drafting a legal boilerplate, writing product descriptions. But the moment the task is relational, the moment the person reading the email has a mental model of you, that regression becomes a liability. They notice the gap.

How quickly do they notice, though? Like, is this something people consciously clock, or is it more of a vague feeling that something's off?

Mostly the latter, which is actually worse in some ways. If someone could point at a sentence and say "that's not you," you could fix it. But what usually happens is the reader just feels a slight distance they can't explain. The email was fine. They just didn't feel like they'd heard from you. And that accumulates over time in a relationship.

I think that's actually the core of what Daniel's asking. It's not really a technical question about AI. It's a question about what writing is for. When you send an email to someone you have a real relationship with, the information is almost secondary. The tone is the message.

That's a good way to frame it. There's a piece from Every this year that puts it well, the idea that AI defaults to a kind of homogenized corporate register that strips out the emotional texture. And emotional texture is precisely what signals to the reader: this came from a human who was actually thinking about you when they wrote it.

Which is why a technically perfect email can still feel cold. Everything is in order and something is missing.

The something that's missing is usually the writer. The small choices, the slightly unconventional word, the sentence that trails off in a way that's characteristic of how that person actually thinks. AI doesn't generate those by accident. You have to engineer them in somehow.

Can you give me a concrete example of what that looks like? Like, what's actually different between a real email and the AI version?

Say you're writing to a longtime client to flag a project delay. A real email from someone who knows that client might open with a reference to something specific—a joke from the last call, an acknowledgment of something the client mentioned offhand. It might use a slightly self-deprecating tone because that's how the relationship works. The AI version opens with "I wanted to reach out regarding a timeline update.But it reads like it could have been sent to anyone, because it was written for anyone.

The specificity is the signal. The fact that you remembered the joke is the whole point.

And that's not something you can prompt your way into unless you're feeding the model the context that makes the specificity possible. Which is a different problem than voice—it's about memory, about relationship history. Voice is the harder one to solve because it lives in patterns you've built up over years of writing, not in any single fact.

The question is how.

The attention mechanism is actually where this gets interesting technically. When a transformer model generates a token, it's attending to context, weighting what it's seen, and producing the statistically most coherent continuation. The problem is that "coherent" in this sense means coherent with the training distribution. It doesn't mean coherent with how you specifically write.

It's not that the model is ignoring tone. It's that it's attending to the wrong reference point.

It has no strong prior for your voice specifically. It has a very strong prior for professional email in general. So even if you paste in a sample of your writing as context, the model is balancing that against everything it learned during pretraining, and pretraining is massive. Your fifty example emails are a whisper against that prior.

Which explains something I've noticed. You give it a sample, it picks up a few surface features, maybe you use em dashes a lot or start sentences with "Look," and it mimics those. But the deeper register, the actual way you think on the page, that stays flat.

Surface mimicry is a real trap. There's a concrete illustration of this. Take an email from someone who writes in short, punchy bursts with a dry edge. Ask a baseline model to match that style. What you get back tends to have the short sentences, maybe a clipped word here or there, but the underlying cadence is still corporate. The rhythm is off. It's like someone doing an impression by copying your vocabulary but not your timing.

The uncanny valley of prose.

The tradeoff there is real. A model that's very good at following instructions, at being helpful and clear, has been trained in ways that specifically smooth out idiosyncrasy. The RLHF process, the reinforcement learning from human feedback, it rewards outputs that raters find polished and appropriate. Idiosyncratic writing doesn't always score well with raters who don't know the person.

The training objective is working against you at the exact moment you need it to step aside.

And this is where the "why does it happen" question bottoms out. It's not a bug in the usual sense. The model is doing what it was optimized to do. The optimization just wasn't pointed at you.

There's something almost philosophical about that. The model is maximally helpful to everyone in the aggregate, which makes it minimally useful for anyone in particular.

That's the core tension. And it means the solutions have to work against the grain of how these models were built. You either shift the prior, which is what fine-tuning does, or you work with prompting strategies that force the model to weight your examples more heavily than its defaults. Neither is free.

Neither is invisible to the reader, if you don't get it right.

The Every piece from this year frames it as the homogenized corporate register problem. The model learned to write in a way that's legible to everyone, and legible to everyone means characteristic of no one. Getting out of that requires deliberate intervention at some level of the stack.

There's actually a fun historical parallel here. When typewriters became widespread in the early twentieth century, there was a whole wave of anxiety about whether typed letters felt less personal than handwritten ones. The argument was that handwriting carried personality—the pressure of the pen, the idiosyncratic letterforms—and typing flattened that. People wrote etiquette columns about when it was appropriate to type versus write by hand.

Now we're having the same argument one level up. Typing versus AI. The medium keeps changing, the anxiety about authenticity is the same.

The difference being that handwriting versus typing is at least legible as a choice. The reader can see which one you used. With AI, there's no visible signal. The email looks like an email.

Which is part of why the stakes feel higher. The deception, if you want to call it that, is invisible in a way it wasn't before.

The question becomes where in the stack you intervene, and how much friction you're willing to accept to do it. Fine-tuning, for instance, is one approach—but how practical is it?

Fine-tuning is the most direct answer, and it's gotten meaningfully more accessible. The technique that's changed the equation is LoRA, Low-Rank Adaptation, which lets you adapt a model's weights without retraining the whole thing. You're essentially adding a thin layer of adjustments on top of the base model that pulls it toward your writing patterns. The compute cost is a fraction of full fine-tuning.

The dataset for this is just...

Your emails, your messages, whatever corpus of your own writing you can assemble. There's a worked example from Vadim's blog this year where someone fine-tuned Qwen3 using LoRA specifically for cold email outreach, matching a particular sales voice. Small model, somewhere in the one-point-seven to fourteen billion parameter range, fine-tuned on domain-specific examples. The result was measurably more consistent with the target voice than the base model.

How measurable are we talking?

The claim in the literature is that fine-tuning can reduce tone inconsistencies by around sixty percent compared to a pre-trained baseline. Which is significant, but also worth parsing. Sixty percent fewer inconsistencies is not sixty percent of the way to sounding like you. Those are different things.

Right, that's the gap between "less wrong" and "actually right.

It surfaces a real practical question, which is how much data you need. If your writing corpus is thin, the fine-tuned model doesn't have much to anchor to. You get a slightly personalized version of the base model rather than something that captures your register.

What counts as thin, though? Like, is there a rough number people should be aiming for?

The guidance I've seen puts a functional floor somewhere around two to three hundred examples for LoRA on a small model. Below that you're not giving it enough signal to distinguish your patterns from the base distribution. A few hundred is achievable for most people who've been writing professionally for a few years—you probably have that sitting in your sent folder right now. The quality matters too, though. Two hundred emails where you were clearly on autopilot are worth less than fifty where you were actually trying to communicate something.

The people who write a lot get the most benefit, which feels a bit unfair to the people who most need the help.

There's something to that. The other approach, which requires no fine-tuning at all, is what the Every style guide piece describes as building a written spec of your voice. Not examples, but an articulated description of how you write. Sentence length preferences, what you avoid, how you open, how you close, what topics you treat with levity versus weight. You feed that as a system prompt and you're essentially giving the model a manual rather than training data.

I'd be curious how far that actually gets you. Because describing your own voice is hard. Most people don't have that kind of meta-awareness about their writing.

It's hard, and the research suggests asking AI to help you build the spec from your existing writing, which is a bit recursive but it works. Let the model analyze your emails and surface the patterns, then you edit that analysis into something accurate. That edited spec becomes your prompt template.

The workflow is: AI reads your writing, tells you how you write, you correct it, then you use that corrected self-portrait to prompt future AI writing. That's a strange loop.

And what's interesting is that the correction step is where most of the value lives. People consistently find that the AI's first pass at characterizing their voice is about seventy percent right and thirty percent confidently wrong in ways that are very revealing. It'll say you write with a formal register when you actually write with a casual one, or it'll miss that you almost never use passive voice. Fixing those errors forces you to articulate things about your writing that you'd never have put into words otherwise.

You learn something about yourself in the process that you couldn't have gotten any other way.

The third approach, which Ana Canhoto's framework describes as hybrid, is the one I think has the most practical traction for most people. AI drafts, you edit. But specifically, you're not editing for content. You're editing for voice. You read the draft asking only: where does this not sound like me? And you fix those spots.

Which is a different cognitive task than writing from scratch, but still requires you to know what sounds like you.

Paul Graham's substack piece this year makes the point that asking AI for feedback rather than rewrites is often more useful for preserving voice. The model tells you where your argument is unclear, you fix it in your own words. The voice never leaves your hands.

That's the one that appeals to me most, honestly. Keep the thinking on your side, outsource the structural critique.

The tradeoff is speed. If you're writing forty emails a day, stopping to edit each one for voice isn't obviously faster than just writing them.

Authenticity and efficiency are in tension here, not just in theory.

Which is why the framework question matters. Not every email needs your full voice. The level-zero to level-four spectrum that Canhoto lays out is useful: high-authenticity contexts, personal relationship, emotional stakes, anything where the reader knows you well, those sit at the human end. Standardized, repetitive, low-relationship content can sit at the AI-heavy end. The mistake is applying the same approach to both.

Treating a vendor invoice follow-up the same as a note to a colleague you've worked with for a decade.

The cost of getting that wrong in the second case is real. The reader notices, even if they can't name what's off.

Is there a useful heuristic for drawing that line? Like, how do you actually decide in the moment which bucket an email falls into?

The one I find most useful is asking: would this person be surprised if they found out an AI wrote this? For the invoice follow-up, probably not. For the email checking in on someone after a hard conversation, yes, they'd be surprised, and that surprise would tell you something about what the email was supposed to be doing. If the answer is yes, that email needs your hands on it.

That's a clean test. The surprise question.

You don't have to think hard about every email if you've internalized where your relationships sit on that spectrum.

Given all of that, what does a practical starting point actually look like? For someone who isn't going to spin up a GPU cluster but wants meaningfully better results.

The lowest-friction entry point is the written spec approach. Take thirty or forty of your own emails, paste them into a model, and ask it to describe how you write. Not summarize what you said, but characterize the voice. Sentence length, opening patterns, how you signal warmth or urgency, what you never do. You'll get back something that's partially right and partially wrong, and the act of correcting it is itself clarifying.

You learn something about yourself in the process.

Often things you couldn't have articulated unprompted. Then you compress that into a system prompt you reuse. Something like two hundred words that travels with every writing task. That alone, consistently applied, closes a meaningful chunk of the gap without any fine-tuning.

For people who want to go further?

LoRA fine-tuning on a small model is within reach now. The Qwen3 example I mentioned earlier, that's a model you can run and adapt without enterprise infrastructure. You need a reasonably clean corpus of your own writing, a few hundred examples minimum, and a service that handles the training run. Modal is one option for that kind of workload, actually.

The cringe test is worth mentioning here too. There's a heuristic from Serious Insights that I think is underrated: read the AI draft out loud and flag anything you'd never say. Not anything that's wrong, anything that's not you. That's the edit pass.

Reading aloud catches register problems that silent reading misses. The rhythm is wrong before you can articulate why.

I've started doing this and it's almost embarrassing how obvious the problems become. There's a whole category of phrase that looks fine on screen and sounds immediately wrong the moment you hear it. "I wanted to circle back" is the canonical one, but there are subtler versions.

The out-loud test is also useful because it catches length problems. AI tends to run long in ways that feel measured on screen but exhausting when spoken. If you're running out of breath before you hit a period, the sentence is too long for your voice.

The framework question, which email actually needs your voice and which one doesn't, that's probably the highest-leverage decision most people aren't making deliberately.

Spend your editing energy where the relationship is real. Let the rest run on autopilot.

The question I keep coming back to is what happens when this gets easier. Because right now there's still enough friction that people make deliberate choices about when to use it. When the friction disappears, and it will, does the deliberate choice disappear with it?

That's the one I don't know the answer to. You could see it go either way. Either people develop better instincts about when their voice matters and when it doesn't, or the defaults take over and we end up in a world where nobody's quite sure who actually wrote anything.

Whether that matters. Which is maybe the more uncomfortable question.

The reader's experience changes too. If you know that the person who sent you a warm, thoughtful email might have spent forty-five seconds reviewing an AI draft, does that change what the email means to you? I think it does, and I think most people haven't sat with that yet.

It's the difference between a gift and a gift card. Both are fine, but they're not the same thing.

The analogy extends further than it seems. A gift card isn't worse because it took less time. It's different because it signals a different kind of attention. You didn't think about what the person would specifically love. You thought about the category. AI email can be the same thing—you thought about the type of email this is, not about the specific person receiving it.

Some people are fine with gift cards. The question is whether you know which kind of relationship you're in.

Which brings it back to Daniel's original question in a way. The technical problem is real, but underneath it is a question about what you owe the people you're writing to. The tools are just surfacing a decision that was always there.

The open question for listeners, the one worth sitting with, is where your own line is. Not in the abstract, but specifically. Which emails in your life would you want to know were written by a person, and which ones don't matter?

That's a good one to take away. Thanks to Hilbert Flumingtop for keeping this whole operation running, and to Modal for the compute that makes the pipeline work. By the way, today's episode was written by Claude Sonnet four point six, which feels appropriate given the subject matter.

This has been My Weird Prompts. If you've got a moment, a review on Spotify goes a long way. We'll see you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2334: How AI Flattens Your Voice in Emails

Mentions

Downloads

You Might Also Like

#2334: How AI Flattens Your Voice in Emails