#2345: Why File Naming Conventions Are More Than Just Style

Discover how file naming conventions like snake_case and camelCase impact development workflows, CI/CD pipelines, and filesystem compatibility.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2503
Published: Apr 20
Duration: 24:47
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Claude Sonnet 4.6
Topics: software-development version-control file-naming-conventions

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

File naming conventions are often dismissed as mere stylistic preferences, but they play a pivotal role in development workflows, CI/CD pipelines, and filesystem compatibility. This episode explores the taxonomy of naming conventions—snake_case, camelCase, PascalCase, kebab-case, SCREAMING_SNAKE_CASE, and Train-Case—detailing their origins, ecosystem preferences, and practical implications.

Each convention emerged from specific constraints. For instance, snake_case traces back to C and Python, where underscores provided a clean separator that parsers could reliably interpret. camelCase and PascalCase, rooted in Algol and Pascal traditions, became staples in Java and C#, with PascalCase signaling types or classes in TypeScript. SCREAMING_SNAKE_CASE, used for constants in Unix shell scripts, emphasizes visual distinctiveness to enforce discipline.

The episode highlights the machine-safety concerns tied to filenames, such as case sensitivity across filesystems. Developers often work on case-insensitive systems like macOS’s APFS or Windows’ NTFS, while production servers typically run case-sensitive systems like Linux’s ext4. This mismatch can lead to latent failures, where a renamed file works locally but breaks in CI/CD pipelines or on different operating systems. Git’s handling of case-insensitive renames further complicates matters, as it may not track changes that only surface in case-sensitive environments.

The discussion underscores the importance of treating filenames as interfaces rather than labels. Conventions serve as shared contracts between developers, tools, and downstream processes, ensuring reliability and reducing cognitive friction. By understanding the tradeoffs and constraints behind each convention, developers can make informed choices that align with their ecosystems and avoid costly errors.

Ultimately, file naming conventions are more than just style—they’re architectural decisions that impact the robustness and maintainability of codebases. This episode offers practical insights and actionable advice for navigating this often-overlooked aspect of software development.

Mentions

C# Programming language with PascalCase conventions
Claude Sonnet 4.6 AI model that powers the script
Docker Containerization platform for consistent environments
Git Version control system with case sensitivity quirks
Node.js JavaScript runtime using camelCase conventions
PEP 8 Python style guide for naming conventions
pre-commit Tool for managing git pre-commit hooks
React JavaScript library using camelCase conventions
ShellCheck Static analysis tool for shell scripts
TypeScript Programming language with PascalCase conventions

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Featured In

Creator's Picks 304 episodes

#2345: Why File Naming Conventions Are More Than Just Style

A team pushes a rename to their repo — capital S on a shell script, Script.sh instead of script.On their MacBooks, running APFS, nothing breaks. The filesystem doesn't even register that anything changed. They merge, the CI runner spins up on Linux, ext4, case-sensitive, and it can't find the file. Not because the code was wrong. Because a letter changed case.

I'm Herman Poppleberry, and that story is not hypothetical. That exact failure mode — I keep wanting to say failure mode, but let's just call it what it is, that exact way things broke — shows up constantly in post-mortems. And it's almost never the first thing people look for, which is part of what makes it so costly.

Daniel sent us this one, and I think he framed it well. The question is essentially: why do file naming conventions and machine-safe naming practices matter more than most developers actually treat them? He wants us to cover the full taxonomy — kebab-case, snake_case, camelCase, PascalCase, Train-Case, SCREAMING_SNAKE_CASE — where each one comes from, what ecosystems favor it, when to reach for which. And then the deeper layer: the practical machine-safety concerns. Spaces and special characters in filenames, case sensitivity across filesystems, length limits, reserved characters, Unicode hazards, emoji in paths. And what all of that means when it actually breaks — shell scripts, glob expansion, Git on a case-insensitive filesystem, CI/CD pipelines going down. The underlying principle he's pushing toward is that files aren't labels. They're interfaces.

Which is the reframe that I think unlocks everything else. Once you think of a filename as an interface — something that other systems, other processes, other humans working programmatically will consume — the question of what you name it stops being aesthetic and becomes architectural.

By the way, today's episode is powered by Claude Sonnet four point six.

Good to know our script has impeccable taste.

The CI/CD example is worth sitting with for a second before we get into taxonomy, because it illustrates something specific. It wasn't a logic error. It wasn't a dependency failure. It was a filesystem disagreement about whether two strings were the same string.

The insidious part is that Git, on a case-insensitive filesystem, will not track that rename as a rename. It sees no change. So the developer who made the change has no indication anything went wrong. Their local tests pass. Their colleague's tests pass. Everyone on macOS is fine. The problem only materializes when the code hits a system with different assumptions baked into the kernel.

The failure was latent. It existed the moment the rename happened. It just didn't surface until the environment changed.

And that latency is what makes poor file naming dangerous in a way that a syntax error isn't. A syntax error fails immediately and loudly. A filename that violates machine-safety assumptions can sit quietly in a repository for months before it detonates in production, or in a deployment pipeline, or when someone tries to run the project on a different operating system.

How long are we talking, realistically? Like, in a real team, how long could something like that sit before it surfaces?

Indefinitely, if the team is homogeneous enough. If everyone is on macOS, running the same CI image, never touching Windows — you could go years. The trigger is usually something environmental. You migrate your CI from one provider to another, you onboard a developer who runs Linux locally, you upgrade your Docker base image and the new one uses a different filesystem configuration. Something shifts in the environment and suddenly a latent assumption becomes a live failure.

Which is a pretty good argument for treating this as infrastructure, not housekeeping.

Infrastructure is exactly the right frame. And the taxonomy of naming conventions is where that becomes concrete, because each convention exists for a reason that's grounded in what the consuming system expects.

Right, and I think that's the thing people miss. They see snake_case versus camelCase as a style preference, like tabs versus spaces, something to argue about and then forget. But the conventions map to actual ecosystem constraints.

The Wikipedia article on programming naming conventions traces snake_case back to C — specifically to Kernighan and Ritchie's original work in 1978. The underscore was the separator that worked cleanly in identifiers when spaces obviously couldn't, and Python inherited that lineage hard. The standard library, PEP 8, all of it.

Kebab-case is the URL-friendly cousin. Hyphens instead of underscores, which is why you see it everywhere in web contexts — CSS class names, URL slugs, Lisp, which predates most of this by decades.

CamelCase and PascalCase come out of the Algol and Pascal traditions, which fed directly into Java and C sharp. PascalCase is literally named after the Pascal language. And the distinction between camel and Pascal — whether the first word is lowercase or capitalized — sounds trivial until you're in a codebase where the convention signals whether something is a variable or a type.

There's actually a fun piece of trivia here. The term camelCase itself wasn't widely standardized until the nineties, even though the style had been in use for decades. Different communities called it different things — InterCaps, BumpyCaps, WikiCase if you were in that world. The camel metaphor only stuck because it was the most evocative. You look at the humps in the middle of the word and it just clicks.

WikiCase is a good one because it shows how the same convention gets reinvented independently when the constraint is the same. Wiki software needed page names that were both human-readable and automatically linkable without special syntax. So you smash the words together with capital letters and the software can detect word boundaries. Same underlying problem as a parser that can't handle spaces, same solution.

SCREAMING_SNAKE_CASE is the one that announces itself. All caps, underscores, constants only. Unix environment variables. If you see MAX_RETRIES in a shell script, you know immediately what it is and you know not to reassign it mid-execution.

Which is the point of the convention. It's communicating to the reader and to the tooling simultaneously. Machine-safe naming works the same way — it's not just about what the filesystem accepts, it's about what downstream processes can reliably parse without you having to handle edge cases.

The conventions are a shared contract.

A shared contract between the developer who names the file, the tools that consume it, the CI system that runs against it, and the next developer who has to work with it six months later without any context.

Contracts have consequences when you break them. So let's actually work through what each convention is doing mechanically, because I think the tradeoffs become visible when you look at them that way.

Start with snake_case, because it's probably the cleanest example of a convention that emerged from a hard constraint. The underscore was never going to be misinterpreted by a parser. It's not a mathematical operator, it's not a path separator, it's not a shell metacharacter. It just sits there, inert, doing its job.

Which is why Python leaned into it so completely. You look at something like the requests library — get, post, send_request, response_headers — everything lowercase, everything underscored. There's no ambiguity about what the tokenizer sees.

The readability argument is real. For long identifiers, underscores are genuinely easier to scan than camel humps. calculate_total_invoice_amount is easier to parse at a glance than calculateTotalInvoiceAmount, at least for most readers.

Though JavaScript developers would fight you on that.

They would, and not without reason. camelCase in JavaScript is load-bearing. The language itself, the DOM API, every major framework — React, Vue, Node — uses camelCase for variables and functions. It's so deeply embedded that violating it reads as a bug, not a preference. If you're writing a React component and you name a prop background_color instead of backgroundColor, someone is going to think something went wrong.

The convention carries semantic weight. It signals which ecosystem you're operating in.

PascalCase takes that further. In TypeScript, in C sharp, PascalCase on an identifier is a strong signal that you're looking at a type or a class, not a variable. MyComponent, UserProfile, InvoiceService. The capitalization is doing type-system communication before you even read the definition.

That's actually enforced by some linters, right? It's not just convention at that point — the tooling will flag it.

In TypeScript with strict ESLint rules, yes. There are rules that specifically require PascalCase for type aliases and interfaces and will throw a warning if you deviate. So the convention has been promoted from social agreement to automated enforcement. Which is exactly where you want it.

Train-Case is the one that doesn't come up as often in these conversations. Content-Type, Accept-Encoding, X-Request-ID. It's kebab-case with the first letter of each word capitalized.

Right, and it exists almost entirely in that one context. You wouldn't use Train-Case for a Python variable or a JavaScript function. It's domain-specific in a way the others aren't. If you see it somewhere unexpected, that's actually a signal that something's probably wrong.

SCREAMING_SNAKE_CASE — the interesting thing about it is that it's the only convention where the visual weight is intentional by design. It's supposed to stand out.

Defensive programming through typography, almost. The all-caps is saying: this is a constant, treat it with respect, do not shadow it, do not reassign it. In Unix shell scripts, environment variables like PATH, HOME, MAX_CONNECTIONS — the convention enforces a discipline that the language itself often doesn't enforce mechanically.

Each of these conventions is solving a specific problem in a specific context. The mistake is importing one into an ecosystem that expects another.

Which happens constantly. A Python developer who's been writing snake_case for years joins a TypeScript project and names everything with underscores. But every other developer on the team has to do a small cognitive translation every time they read it. That friction compounds.

In filenames specifically, the stakes are higher than in code identifiers, because the filesystem doesn't know which language you're using.

The filesystem is the great equalizer. It doesn't care about your language idioms or your team's style guide. It has its own rules, and they vary depending on which filesystem you're actually sitting on.

Which is where things get treacherous. Because most developers work on one machine, with one filesystem, and they build up intuitions that are just... wrong in other contexts.

The three you need to understand are ext4, APFS, and NTFS. ext4 is the default on most Linux systems. It's case-sensitive. txt and foo.txt are two different files. APFS is what macOS has been running since 2017 and it's case-insensitive by default, though you can format a volume as case-sensitive if you know to ask. NTFS, Windows, also case-insensitive by default. So you have this situation where the majority of developer laptops are running case-insensitive filesystems, and the majority of production servers are running case-sensitive ones.

That's a structural mismatch baked into the industry.

Git doesn't paper over it cleanly. If you rename a file from utils.js to Utils.js on macOS, Git on that filesystem sees no change. The rename simply does not register. You have to use git mv with the dash dash force flag to make it stick, or rename it to something else entirely and then rename it back. It's awkward.

If someone doesn't know to do that, they commit what they think is a rename, push it, and the CI runner on Linux tries to find Utils.js and finds utils.js instead, which is a different file, which may or may not exist.

The pipeline breaks with a file not found error, and the developer is staring at their screen thinking, but it's right there. I can see it. Because on their machine, it is right there.

That's a particularly cruel debugging experience.

And the length limits add another layer. NTFS supports filenames up to two hundred and fifty-five characters. ext4 supports up to two hundred and fifty-five bytes. Those sound equivalent until you introduce Unicode, because a single Unicode character can be two, three, or four bytes. So a filename that's two hundred characters long in a script using multi-byte characters might be perfectly legal on NTFS and blow the limit on ext4.

Nobody tests for that. Nobody is sitting there counting bytes in their filenames.

Until a deployment script hits a path that's too long and fails silently or throws an error that doesn't obviously point to the filename length as the cause.

What about reserved characters? Because Windows has a list that I think surprises people who've only worked on Unix.

It's substantial. On Windows with NTFS, you cannot use a forward slash, backslash, colon, asterisk, question mark, double quote, less-than, greater-than, or pipe in a filename. That's nine characters that are either path separators, shell metacharacters, or redirects. Unix is more permissive — technically the only truly forbidden characters in a filename on ext4 are the forward slash and the null byte. Everything else is legal, which is precisely the problem.

Because legal on the filesystem and safe in a shell are completely different things.

A filename with a space in it is perfectly legal on every major filesystem. It will also break any shell script that isn't quoting its variables correctly. And most shell scripts, if we're being honest, are not quoting their variables correctly everywhere.

Someone names a file quarterly report.csv, and then a script tries to process it and the shell interprets quarterly and report.csv as two separate arguments.

Glob expansion makes it worse. If you have a directory with files named report 1.txt, report 2.txt, report 3.txt, and you write a script that does something like for file in star dot txt, the shell expands that glob and then word-splits on the spaces, and suddenly your loop is iterating over report, 1.txt, report, 2.txt — six tokens instead of three files.

Which is a bug that only appears when the filenames have spaces, so it works fine in testing with clean names and detonates in production when a user uploads something with a normal human-readable name.

The fix in the shell script — quoting your variable in double quotes — is one of those things that feels like a minor style point until it isn't. ShellCheck, the static analysis tool for shell scripts, will flag unquoted variables, and this is exactly why. The tool exists because the failure mode is so common and so non-obvious.

That's actually a good example of the linter doing work that the runtime won't. The shell will happily execute the broken version. It just won't do what you meant.

Unicode and emoji take this further. Modern filesystems handle Unicode reasonably well in isolation. The problem is cross-platform consistency and the tools that sit above the filesystem. A filename with an emoji in it might display correctly in Finder, refuse to tab-complete in certain terminals, fail to match in a regex that wasn't written to handle multi-byte sequences, and cause a Python script using the older string handling to throw a codec error.

There's the normalization issue. Unicode has multiple ways to represent the same character. A filename with an accented e might be stored as a single precomposed character on one system and as a base letter plus a combining accent on another. Those are different byte sequences. Git sees them as different files.

Which is a real source of mysterious duplicates in repositories when developers on different operating systems are working with filenames that include diacritics.

The principle that ties all of this together is the one Daniel pushed toward in the prompt. Files aren't labels. They're interfaces.

An interface that has to be consumed reliably by your shell, your build system, your version control, your CI runner, your deployment scripts, and every developer who clones the repository on whatever operating system they happen to be using. When you name a file, you're not just describing its contents. You're making a promise about how it can be referenced programmatically.

Breaking that promise doesn't always announce itself immediately. That's the thing that separates a filename problem from a code problem. A bad variable name causes a syntax error or a type error right away. A bad filename sits quietly until the environment changes, or until a script runs that wasn't written defensively, or until someone on a different OS joins the project.

Latent failures are always more expensive than immediate ones. The CI pipeline that breaks after a merge is expensive. The production deployment that fails because of a path issue that's been in the codebase for eight months is catastrophic.

Entirely preventable with about five minutes of thinking upfront.

Five minutes and a linter, honestly. Because the good news is most of this is automatable. You don't have to rely on developers remembering the rules under deadline pressure.

What does that actually look like in practice? Someone starts a new project — what are the concrete decisions they should be making on day one?

First decision: pick one convention for filenames and write it down. Not in your head. In a contributing guide, in a README, somewhere that a new team member will actually find it. The specific convention matters less than the consistency. kebab-case for everything is a perfectly defensible choice for a web project. snake_case for a Python project. What kills you is mixing them because nobody decided.

Is there a way to enforce that without it becoming a code review argument every time someone opens a PR?

That's exactly the right question, because code review is the worst place to catch this. By the time it's in a PR, someone has already done the work, and asking them to rename files feels petty even when it matters. You want the enforcement to happen before the commit, not after. Which is where pre-commit hooks come in.

The filesystem-level stuff?

Assume case-insensitive even if your current environment is case-sensitive. It costs you nothing to name files with that constraint in mind, and it means you'll never hit the Git rename trap. Practically: all lowercase, hyphens or underscores, no spaces, no special characters outside that set. That's a rule you can put in a pre-commit hook and enforce automatically.

Pre-commit hooks are underused for exactly this kind of thing.

There's a tool called pre-commit, the Python package, that makes it trivial to run filename checks before anything gets staged. You can write a hook in twenty lines that rejects any filename containing a space, an uppercase letter in a context where you've decided on lowercase, or a character outside your allowed set. The failure is immediate and local, not three steps downstream in CI.

What about the length issue? The byte-versus-character problem on ext4?

Keep filenames short. Under a hundred characters is a good rule of thumb that gives you headroom on every major filesystem regardless of encoding. If a filename is approaching two hundred characters, that's usually a sign the path structure needs rethinking, not that you need to count bytes.

Stick to ASCII for filenames. Not because Unicode support is bad, but because the cross-platform normalization problems are subtle enough that they'll bite you at the worst moment. If you're building tooling that has to handle arbitrary user-supplied filenames, sanitize on ingest. Strip or transliterate anything outside ASCII before it touches your filesystem.

The interface principle in action. You control the contract at the boundary.

And the last thing I'd add is: treat your CI environment as the source of truth for what's acceptable. If your pipeline runs on Linux with ext4, your local development should be testing under those same constraints, not assuming macOS forgiveness will hold.

Which is an argument for containerized development environments, but that's a whole other episode.

It really is. Though I'll say — even without full containerization, something as simple as running your test suite inside a Docker container that uses a Linux base image catches a huge proportion of these issues before they reach CI. It's not a complete solution but it closes the most common gap.

That's a whole other episode. But let's land the plane here, because I think the thing worth sitting with is how much of this is invisible until it isn't. You can ship software for years with sloppy file naming and never notice, and then one environment change, one new contributor on a different OS, one CI migration, and suddenly you're debugging something that looks completely unrelated to what's actually wrong.

The diagnosis is hard. The fix is rename the file, add a linter rule, update the convention doc. But you've already burned hours getting there.

Which is what makes it feel like such a waste. It's not a hard problem. It's a neglected one.

The forward-looking question I keep coming back to is whether filesystem design is going to catch up to the mess. There are experiments with content-addressable storage, systems where the identifier isn't a human-readable name at all but a hash of the content. Git's internal object store works that way already. If that model ever surfaces at the filesystem level, a lot of these naming problems just dissolve.

Though you'd introduce a completely different set of problems around human legibility. Someone has to know what the hash refers to.

There's probably no world where you fully escape the tension between names that are meaningful to humans and names that are safe for machines. The best you can do is be deliberate about where you sit on that spectrum and enforce it consistently.

Which is, honestly, a reasonable place to leave it. Don't assume the filesystem is forgiving just because your laptop is.

Write it down before you need it, not after.

Thanks to Hilbert Flumingtop for producing the show, and to Modal for keeping our infrastructure from doing exactly what we've been describing for the last twenty-five minutes. This has been My Weird Prompts. If you've got a moment, a review on Spotify goes a long way. We'll see you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2345: Why File Naming Conventions Are More Than Just Style

Mentions

Downloads

You Might Also Like

Featured In

#2345: Why File Naming Conventions Are More Than Just Style