#4053: How to Make AI Write Prose, Not Bullet Points

Why LLMs default to lists and how to force them into flowing, professional prose.

Featuring
Listen
0:00
0:00
Episode Details
Episode ID
MWP-4232
Published
Duration
22:29
Audio
Direct link
Pipeline
V5
TTS Engine
chatterbox-regular
Script Writing Agent
deepseek-v4-pro

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Large language models don't just prefer bullet points — they're statistically addicted to them. The default output structure of most LLMs is a cascading list of dashes and numbered items, a pattern deeply embedded by three converging forces: training data dominated by web-scraped lists, RLHF reward models that favor scannable outputs, and attention mechanisms that find structural delimiters like line breaks and dashes to be low-entropy, low-surprise paths.

This creates a credibility problem for enterprises deploying AI to write reports, memos, and executive summaries. Expert readers interpret bullet-point formatting as pre-digested simplification — the structural equivalent of someone speaking slowly and loudly. The solution isn't negative prompting ("don't use bullet points"), which fails 30-40% of the time because negation is a weak signal in transformer architectures. Instead, effective prose steering requires positive constraints: dense system prompts that specify paragraph structure, topic sentences, and transitions, paired with few-shot examples that create new low-entropy paths for the model to follow.

For production-scale reliability, the engineering levers span a spectrum. System prompting with examples achieves roughly 60% reliability. Fine-tuning on curated datasets of dense prose can push that to 90% but requires thousands of dollars in compute and significant data curation effort. Textual LoRAs offer a lightweight, toggleable middle ground — effective for style steering without degrading general capabilities. The key insight: you can't fight the bullet-point valley by nudging toward the prose hill. You have to bulldoze a new channel.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3
Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#4053: How to Make AI Write Prose, Not Bullet Points

Corn
Last episode we gave listeners a tip — if you want to sound like a bot, lean into bullet points. Daniel's prompt this week basically grabs that tip and yanks it in the opposite direction. He wants to know why large language models default so aggressively to bullet-point output, why that's actually a real problem in professional contexts, and what engineering levers exist to force them toward prose-first writing — the kind that reads like a McKinsey white paper, not a dummies' guide. He's asking us to compare system prompting, fine-tuning, and something he calls textual LoRAs, which I'll admit I had to look up.
Herman
You looked it up because you wanted me to explain it to you before we recorded.
Corn
I looked it up because I value our friendship and didn't want you to have to do all the work.
Herman
That's very generous. And also completely false. But Daniel's question is genuinely important right now. Enterprises are deploying AI agents to write reports, memos, executive summaries — and the default output style is this relentless bullet-point cascade that reads like a first-year consultant discovered the formatting toolbar. It creates a credibility gap. Senior readers see that structure and their brains flag it as simplified, reductive, maybe even untrustworthy.
Corn
Right — it's the formatting equivalent of someone speaking slowly and loudly at you. The information might be correct, but the presentation signals "I assume you need this dumbed down." And Daniel's observation about our own scriptwriting agent is the perfect springboard here. We taught listeners how to mimic bot-like structure. Now we're asking the reverse engineering question: how do you force a bot to stop writing like a bot?
Herman
The answer turns out to be a lot more interesting than "just tell it not to use bullet points." Which, by the way, barely works. There's a real technical reason for that, and it connects all the way down to how these models process structural tokens during generation.
Corn
Where do we start — the "why" or the "how to fix it"?
Herman
We have to start with the why. Because if you don't understand why the model is pathologically addicted to bullet points, none of the fixes make sense. You'll just be throwing prompts at a wall.
Corn
Walk me through it. Why does every language model on earth seem to wake up and choose bullet points?
Herman
Three things are happening at once. First, the training data. These models are trained on enormous scrapes of the open web — Wikipedia, Stack Overflow, documentation sites, how-to guides, cooking blogs that give you a six-paragraph life story before the recipe. A huge fraction of that text uses bullet points and numbered lists as structural scaffolding. The model learns, at a statistical level, that when you're explaining something or presenting information, the natural shape of that explanation is a list.
Corn
It's not that the model "prefers" bullet points in any intentional sense — it's that the training distribution makes list structures the path of least resistance.
Herman
The second factor is RLHF — reinforcement learning from human feedback. When these models go through preference tuning, human raters consistently reward outputs that are concise, scannable, and easy to verify at a glance. Bullet points score high on all three. A dense paragraph requires actual reading. A bullet list lets the rater tick through claims quickly. So the reward model reinforces the behavior.
Corn
Which means the very process that makes these models "helpful" is also what makes them write like a rushed analyst who's terrified you'll stop reading.
Herman
The third factor is the one most people don't think about — the attention mechanism itself. Bullet points, dashes, numbered items act as structural delimiters. They reduce what you might call format entropy — the model's uncertainty about how to organize tokens. Starting a new line with a dash is a low-surprise move. The model has seen it millions of times. It's the syntactic equivalent of walking downhill.
Corn
When you type "summarize the quarterly results" and the model sees that prompt, the statistically coziest next move is a colon followed by a line break and a dash. Not because it reasoned about formatting — because the probability landscape tilts that way.
Herman
And this is why negative prompting fails so reliably. When you say "don't use bullet points," the model still activates all the same structural pathways — the dash token, the line break, the numbered list — because negation is a weak signal in transformer architectures. Studies put the failure rate on simple negation instructions around thirty to forty percent. You're asking the model to suppress its strongest statistical instinct with a "please don't.
Corn
The cost here is not just aesthetic. If you're sending an AI-generated market analysis to a board, and it arrives looking like a BuzzFeed listicle, you've lost something before anyone reads a word.
Herman
Expert readers — the kind who read McKinsey reports or white papers — expect prose that builds an argument. Topic sentence, supporting evidence, transition. Bullet points signal summarization, and summarization signals simplification. The reader's brain treats it as pre-digested.
Corn
Which sets up Daniel's real question. If the default is this deeply embedded, what levers do we actually have? He named three: system prompting, fine-tuning, and this textual LoRA approach.
Herman
They sit on a spectrum. System prompting is the quick fix — you write better instructions, you add an example, you cross your fingers. It works maybe sixty percent of the time. Fine-tuning is the nuclear option — expensive, data-hungry, and risky because you can degrade the model's general capabilities. Textual LoRAs sit in this fascinating middle ground — lightweight, toggleable, and surprisingly effective for style steering without breaking everything else.
Corn
The rest of this conversation is basically: how do we climb back up the probability hill the model keeps sliding down?
Herman
Let's start with the negative prompting failure, because it's the most counterintuitive. You'd think saying "don't use bullet points" would work. It's a simple instruction. But what's actually happening under the hood is that the model generates tokens sequentially, and by the time it reaches the structural decision point — colon, then what? — the statistical weight of every training example where a colon is followed by a line break and a dash is bearing down on it. The word "don't" in your prompt is just one more token in a sea of tokens. It gets diluted.
Corn
The model isn't disobeying. It's just that "don't" is a feather, and the training data is a freight train.
Herman
That's the metaphor. And it gets worse. When you write "don't use bullet points," you've still activated all the token pathways associated with bullet points. The model is now thinking about bullet points. You've primed the very behavior you're trying to suppress. It's like telling someone "don't think about elephants.
Corn
Which is why our scriptwriting agent works, and a naive "please write prose" prompt doesn't. The scriptwriting agent never mentions bullet points at all.
Herman
The system prompt for the scriptwriting agent does two things simultaneously. First, it gives positive prose constraints — "write in flowing paragraphs," "use conversational turn-taking," "vary sentence length." Second, it includes a few-shot example of what the output should actually look like. That dual signal is what overrides the bullet-point prior. You're not fighting the model's instincts — you're giving it a different instinct to follow.
Corn
The abstract instruction says what you want. The example shows what you want. And the model, being a pattern-matching engine, latches onto the pattern.
Herman
The pattern is specific enough that it creates a new low-entropy path. Remember, the model gravitates toward bullet points because that path is well-worn and predictable. If you provide a prose example with clear structural markers — topic sentences, transitions, paragraph breaks — you're essentially paving a new path and making it just as statistically comfortable to walk down.
Corn
That format entropy concept you mentioned earlier — I want to sit with that for a second. You're saying the model has an internal probability distribution over what comes next, and structural tokens like dashes and line breaks are heavily weighted because they've been reliable predictors in training.
Herman
Think of it as a landscape. The bullet-point valley is deep and wide. The prose hill is steep. Most prompts just nudge the model gently toward the hill. It takes one look at the valley and slides right back down. What a good system prompt with a few-shot example does is essentially bulldoze a new channel. It doesn't just nudge — it reshapes the probability surface for that specific generation.
Corn
Which explains why the before-and-after difference can be so stark. I'm imagining a typical scenario — someone prompts a model with "analyze the competitive landscape for electric vehicles." The naive output starts with a colon, then a dash, then five more dashes. It's a list of competitors with one-line descriptions. Functional, but reads like meeting notes.
Herman
Now take the same query with a structured prose constraint and an example. You get something that opens with a topic sentence — "The electric vehicle market is undergoing a structural shift from technology differentiation to manufacturing scale as the primary competitive moat." Then it builds the argument across three paragraphs, weaving the competitors into the narrative rather than listing them. Same information, completely different credibility profile.
Corn
The scriptwriting agent is the proof this works at production scale. It's not just generating a paragraph or two — it's generating four thousand words of conversational dialogue, every episode, with consistent formatting. The bullet-point impulse never breaks through because the positive constraints are so heavily reinforced.
Herman
The technical parameters there are instructive. The system prompt doesn't just say "write dialogue." It specifies turn structure, personality traits, pacing rules, prohibitions on certain patterns. It's a dense web of positive instructions, and then it includes actual examples of the desired output format. That combination creates enough steering force to overcome what would otherwise be an overwhelming statistical pull toward structured lists.
Corn
For Daniel's listener who wants McKinsey-style reports, the lesson from the scriptwriting agent is: don't tell the model what to avoid. Tell it what to build. Give it the scaffolding — topic sentence, evidence, transition — and show it a sample paragraph that embodies the target style.
Herman
That gets you maybe sixty percent reliability, which is fine for many use cases. But if you need deterministic prose style across hundreds of outputs — quarterly reports, client deliverables, anything where a bullet-point relapse would be embarrassing — that's where you start looking at the heavier engineering levers.
Herman
Fine-tuning is the one everyone thinks of first, and I understand why. You curate a dataset of, say, two hundred McKinsey-style reports — the real thing, dense prose, layered arguments, zero bullet points unless absolutely necessary — and you retrain the model on those. The output style shifts dramatically. You can hit ninety percent reliability on prose-first formatting.
Corn
Two hundred reports sounds like a lot of PDFs to hunt down and a lot of manual reformatting.
Herman
And that's before you even touch a GPU. The compute cost for full fine-tuning on a model like DeepSeek V4 Pro runs into thousands of dollars, and you need someone who actually knows how to set up the training pipeline without introducing weird artifacts. But the bigger risk is catastrophic forgetting.
Corn
Which is the technical term for "congratulations, your model now writes beautiful prose and has forgotten how to do math.
Herman
That's not even an exaggeration. When you fine-tune on a narrow stylistic dataset, you're adjusting the model's weights across the board. The prose style improves, but factual recall, reasoning, quantitative analysis — those can degrade because the training signal is optimizing for something orthogonal to correctness. You end up with a model that sounds authoritative and is subtly wrong more often.
Corn
You've traded bullet points for elegant misinformation.
Herman
Which is arguably worse. At least bullet points signal "this might be oversimplified." Beautiful prose that's factually shaky is harder to catch.
Corn
Alright, so that brings us to the third option — the one Daniel flagged that I had to look up. What are they, and why do they avoid the fine-tuning trap?
Herman
LoRA stands for Low-Rank Adaptation. It was introduced in twenty twenty-one by a team at Microsoft Research, and the core idea is elegant. Instead of updating all the model's weights during training, you train a small set of additional parameters — a lightweight adapter — that sits alongside the base model and steers its outputs. The base model stays frozen. Only the adapter learns.
Corn
The model's general capabilities are preserved because you're not touching them.
Herman
And because the adapter is small — we're talking a few megabytes for a rank-eight LoRA — you can train it on fifty to a hundred examples, on a single GPU, in a couple of hours. Then at inference time, you load the adapter alongside the base model, and it shifts the output distribution toward your target style.
Corn
DeepSeek V4 Pro supports this natively?
Herman
DeepSeek V4 Pro, released earlier this year, has native LoRA adapter support with minimal latency overhead. You can toggle the adapter on and off per request. That toggleability is a bigger deal than it sounds. You might want bullet points for an internal summary and prose for the client-facing version of the same analysis. With a LoRA, you don't need two separate models. You just flip a switch.
Corn
Walk me through what building one of these actually looks like. Daniel's listener wants McKinsey-style reports. What's the pipeline?
Herman
Step one, you collect fifty to a hundred prose-heavy reports in your target style. These don't need to be perfect — just representative. Step two, you format each one as an instruction-output pair. The instruction is something like "analyze the competitive dynamics of the semiconductor supply chain," and the output is the full prose report. Step three, you train a rank-eight LoRA on DeepSeek V4 Pro for about a hundred steps. That's maybe two hours on an A100. Step four, you deploy the adapter and test it on prompts the model has never seen.
Corn
The reliability difference versus system prompting?
Herman
System prompting alone, with a good few-shot example, gets you around sixty percent reliability — meaning six out of ten outputs stay prose-first without bullet-point leakage. A textual LoRA pushes that to about eighty-five percent. Fine-tuning can hit ninety, but at twenty times the effort and with the catastrophic forgetting risk. The LoRA is the sweet spot.
Corn
The decision framework is basically: if you're generating the occasional report and can tolerate spot-checking the output, system prompting plus a strong example is fine. If you need deterministic prose style at scale — dozens of reports a week, client-facing deliverables, anything where a formatting failure would be embarrassing — invest the twenty hours in building a LoRA.
Herman
I'd add a hybrid approach that most teams overlook. Build the LoRA for your critical outputs, but also maintain a strong system prompt as your default. The prompt handles eighty percent of use cases cheaply. The LoRA is your insurance policy for the high-stakes twenty percent. You get the best of both without paying the fine-tuning tax.
Corn
There's something philosophically satisfying about the LoRA approach too. You're not rewriting the model's personality. You're giving it a stylistic lens it can put on and take off. It's the difference between surgery and a well-tailored jacket.
Herman
If I had to give Daniel's listener a decision tree, it's three branches. Default to system prompting with a few-shot example — that's your baseline. It's fast, it's cheap, and for most internal use cases it's good enough. If you're generating client-facing reports or anything where a bullet-point relapse would make you look sloppy, build a textual LoRA. And only reach for full fine-tuning if you're also adapting domain knowledge — like teaching the model a specialized industry vocabulary alongside the prose style.
Corn
That last point matters more than people realize. Fine-tuning isn't just overkill for style — it's the wrong tool. You're rebuilding the house because you didn't like the paint color.
Herman
Paying a contractor who might knock out a load-bearing wall. So let's make this concrete. Here's a prompt template that anyone can copy and adapt.
Herman
"Write in flowing paragraphs with clear topic sentences. Use bullet points only when listing three or more discrete items. Begin each paragraph with a claim, then support it. Here is an example of the desired style:" — and then you insert a three or four sentence paragraph that embodies the prose you want. That's it. Four sentences of instruction, one example paragraph, and you've just given the model a structural alternative instead of a prohibition.
Corn
That last part is the key insight. You're not saying "don't do the thing you're statistically wired to do." You're saying "here's a different thing, it has its own shape and rhythm, do this instead." The model gets to follow a pattern either way — you've just swapped which pattern it follows.
Herman
The example paragraph does more work than the instructions. The model reads your sample and extracts the structural DNA — topic sentence, evidence, transition, repeat. It doesn't need you to explain paragraph construction. It just needs to see one.
Corn
Which circles back to something Daniel has said before about examples being crucial in prompts. Abstract instructions are a sketch. An example is a photograph. The model is better at tracing than imagining.
Herman
The practical takeaway is: spend ten minutes writing one good example paragraph in your target style. That's the highest-leverage ten minutes in the entire workflow. Everything else — the LoRA training, the dataset curation — builds on that foundation. If you can't articulate what good prose looks like in a single example, you're not ready to automate it.
Herman
There's one question I keep turning over, though. As models get better at instruction-following — and we've seen real gains in the past eighteen months — does the bullet-point bias fade on its own? Or is it baked so deep into the training distribution that no amount of alignment tuning will dislodge it?
Corn
I think it's baked. Not because the architecture demands it, but because the internet is structured that way. Every documentation page, every how-to article, every listicle — that's the training corpus. You can't alignment-tune your way out of the shape of the web.
Herman
There's a twist I don't think most people have clocked yet. We're entering this era of agentic workflows — AI writing to other AI. An analyst agent drafts a report, a summarizer agent condenses it, a briefing agent pulls out action items. In that pipeline, bullet points are actually the optimal format. They're parseable, they're structured, they minimize ambiguity for the next model in the chain.
Corn
The very thing that annoys human readers is a feature for machine-to-machine communication. The prose-versus-bullet debate might not have a single answer — it might just depend on who's reading.
Herman
Which means the skill Daniel's listener is building isn't "how to make AI write prose." It's "how to make AI write prose when the audience is human." That's a more interesting capability — contextual format switching based on the reader, not the content.
Corn
That's probably where this is all heading. Not one default style, but models that modulate format based on context. Bullet points for the summarizer agent downstream. Flowing prose for the board deck. Both from the same model, toggled by something as lightweight as a LoRA.
Herman
The LoRA becomes the stylistic gearshift. And the prompt is just the key that starts the engine.
Corn
We should land this. Next episode, we're digging into something I've been quietly obsessed with — the economics of AI inference at scale, and why the "rent versus build" decision is getting weirder by the month. Daniel's already sent the prompt and it's a good one.
Herman
I've got spreadsheets.
Corn
Of course you do. Thanks as always to our producer Hilbert Flumingtop for keeping this operation running.
Herman
Now: Hilbert's daily fun fact.

Hilbert: In sixteen eighty-one, a Dutch trader crossing the Karakum Desert near modern-day Turkmenistan recorded finding a perfectly preserved mammoth tusk protruding from thawing permafrost — the earliest documented observation of permafrost methane seeps in Central Asia, though he had no idea what he was looking at and described it as "the earth exhaling frozen breath.
Corn
...the earth exhaling frozen breath. That's actually kind of beautiful.
Corn
This has been My Weird Prompts. Find every episode at my weird prompts dot com, or email the show at show at my weird prompts dot com. We'll be back next week.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.