#2935: Notebooks vs Scripts: The Real Tradeoffs

Why data scientists love notebooks but engineers distrust them — and who's right.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3105
Published: May 20
Duration: 27:50
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: software-development data-integrity automation

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Notebooks have become the default interface for an entire generation of data scientists, yet they remain deeply controversial in the engineering world. The core tension comes down to purpose: notebooks are documents designed for exploration, not scripts designed for production. A Jupyter notebook interleaves code, markdown, and outputs in a stateful environment where the kernel remembers everything executed so far. This makes them brilliant for exploratory analysis — you can tweak one cell and see results instantly without rerunning an entire pipeline. But that same statefulness introduces hidden state problems. Running cells out of order can leave a notebook looking correct while producing unreproducible results, a phenomenon known as the run order dependency problem. Tools like Netflix's Papermill address this by enforcing deterministic top-to-bottom execution, treating notebooks as parameterized scripts. Cloud providers have turned notebooks into a gateway drug for their ecosystems — Google Colab, AWS SageMaker Studio Lab, and Databricks all offer free or cheap notebook access to hook users on their broader platforms. The real cost comes in data privacy, vendor lock-in, and egress fees. For production use, notebooks work best when wrapped in platform-level discipline: scheduled reporting with deterministic execution, or as one component in a larger orchestrated pipeline. The notebook isn't the problem — it's the loose notebook file passed around like a Word document that causes trouble.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2935: Notebooks vs Scripts: The Real Tradeoffs

Daniel sent us this one — he's asking why notebooks matter so much in data science. He's used them, gets how they work, but he's still confused about something: why wouldn't you just use a repository? Are notebooks purely for experimentation, or can they actually serve as production environments for analysis and visualization? And are cloud-native notebooks now dominating over locally hosted ones? There's a lot packed into this.

There really is. And the core tension here is something I've watched play out for years — notebooks are everywhere, they're the default interface for an entire generation of data scientists, and yet half the engineering world looks at them like they're a toy. So which is it?

The weirder thing to me is the economics. Cloud providers are pouring serious money into notebook platforms — free GPUs, managed environments, the works. Nobody gives away compute because they're feeling generous. So there's a play here, and I think understanding the notebook's role means understanding what everyone's actually buying and selling.

Let's start with what a notebook actually is, because the definition contains the whole argument. A Jupyter notebook is a document that interleaves code cells with markdown cells and outputs — charts, tables, equations. You execute one cell at a time, and the results appear right below it. It's literate programming meets a read-eval-print loop. The key word there is document. It's not a script you run top to bottom. It's a stateful artifact where the kernel remembers everything you've done so far.

Scripts don't.

Scripts don't. A Python file is a set of instructions that execute in order every time you run it. A notebook is a conversation with an interpreter. You run cell one, then cell two, then you go back and tweak cell one and run it again — and now cell two's output might not reflect what's actually in cell one anymore, because you changed the variable but didn't re-execute downstream.

The notebook is lying to you. Or rather, you're lying to yourself and the notebook is just faithfully recording the lie.

This is what people mean by hidden state. The notebook's displayed output depends on the order you ran cells, not on the order they appear on the page. You can have a notebook that looks perfectly logical top to bottom and produces completely unreproducible results because the author ran cell seven before cell three and never cleaned up.

Which makes it sound like a disaster waiting to happen. And yet here we are, with millions of people using them daily. So clearly something about this model works.

It works brilliantly for what it was designed for. Project Jupyter spun out of IPython in twenty fourteen, but the notebook concept goes back to Mathematica in nineteen eighty-eight — the idea that code, explanation, and output should live together in a single document. For exploratory data analysis, that's transformative. You load some data, you plot a histogram, you notice a weird spike, you filter down to just that segment, you plot again, you write a markdown note about what you found. Each step builds on the last, and the whole narrative is right there.

It's a lab notebook. Like, an actual physical lab notebook where a scientist records observations, sketches results, writes down what they tried and what happened.

That's exactly the right analogy. And just like a physicist wouldn't publish their raw lab notes as a finished paper, you shouldn't ship a raw notebook as a production pipeline. The lab notebook is for you — for thinking, for exploring, for documenting the process. The paper is the polished, reproducible artifact.

Which answers part of the question right there. Notebooks are for exploration and communication. Repositories are for production. But that's too clean, isn't it? Because people do ship notebooks to production all the time.

They do, and it's not always wrong. But let's sit with the exploration-versus-production distinction first, because it explains most of the tradeoffs. In exploration, you want fast feedback. You want to try something, see the result immediately, and decide what to do next. The notebook's cell-by-cell execution gives you that. You don't have to rerun the entire pipeline just to tweak a plot's color scheme — you just rerun the plotting cell. For a data scientist doing exploratory analysis on a dataset they've never seen before, that speed matters enormously.

The alternative — a repository full of scripts — forces you to think about the whole pipeline every time.

Which is great when you know what you're doing. But when you don't know what you're doing yet — when you're still figuring out what questions to ask — the overhead of structuring everything as version-controlled, reproducible scripts is a tax on thinking. The notebook lets you be messy.

Messy isn't a bug, it's a feature.

In the right context. The problem is when that messiness leaks into production. Netflix open-sourced a tool called Papermill in twenty nineteen specifically to address this. Papermill lets you parameterize notebooks and execute them deterministically — you pass in parameters, it runs every cell in order from top to bottom, and produces an output notebook with all the results. It's an attempt to make notebooks behave more like scripts.

You get the notebook format — the nice rendered outputs, the markdown narrative — but with enforced execution order. That sounds like the best of both worlds.

It's a compromise. And Netflix's broader framework, Metaflow, treats notebooks as one component in a pipeline, not the pipeline itself. You might prototype a feature engineering step in a notebook, but when it's ready, it gets extracted into a proper Python module that Metaflow orchestrates. The notebook is the sketchpad, not the blueprint.

I want to go back to the state problem, because I think it's the thing that trips people up most. You mentioned execution order. Can you walk through a concrete example of how this actually breaks things?

Imagine you're analyzing customer churn. You load a CSV in cell one, clean the data in cell two, and train a model in cell three. You get an accuracy of eighty-two percent. Then you go back to cell two and add a line that drops rows with missing values — but you only run cell two. Cell three still shows eighty-two percent accuracy, but it's using the old, uncleaned data that's still sitting in memory. The notebook looks right — cell two clearly shows your new cleaning code — but the output is wrong. If you handed this notebook to a colleague and they ran it top to bottom, they'd get a different result.

That's the best-case scenario, where they get a different result and notice. The nightmare case is when the notebook accidentally works because of some variable left over from a cell you deleted three hours ago.

There's a name for this: the run order dependency problem. And it's why Joel Grus gave his famous "I Don't Like Notebooks" talk at JupyterCon in twenty eighteen — it's got over two hundred thousand views on YouTube now. His argument was basically that notebooks encourage bad software engineering practices. No testing, no modularity, hidden state, and a workflow that actively punishes you for running things in the right order.

I remember the backlash to that talk. The Jupyter community's defense was essentially: you're judging a fish by its ability to climb a tree. Notebooks aren't meant to be software engineering tools.

Which is fair, but it dodges the real issue. Most data scientists aren't software engineers, and they're the ones building models that end up in production. If their primary tool encourages unreproducible workflows, that's a problem that extends beyond personal preference.

Where does that leave the question of using notebooks in production? If the state problem is inherent to the format, is there any legitimate production use case?

There are a few. The most defensible one is scheduled reporting. You have a notebook that pulls data from a database, generates a set of charts and tables, and exports them as an HTML report. If you run it deterministically — every cell in order, from a clean kernel, every time — it's essentially a script that happens to produce nice visual output. Tools like nbconvert can execute notebooks in this way, and Papermill adds parameterization on top.

It's not that notebooks can't be used in production. It's that they require discipline — you have to enforce what a script gives you for free.

And some platforms enforce this for you. Databricks Notebooks are a good example. They're the default interface for building Delta Lake pipelines, and they work in production because Databricks manages execution order and scheduling at the platform level. You're not just emailing a dot ipynb file to someone and hoping they run it correctly. The platform wraps the notebook in enough infrastructure to make it reliable.

That's a useful distinction. The notebook format itself is unreliable; the notebook plus a platform that enforces discipline can be reliable. It's not the notebook that's the problem, it's the loose notebook file passed around like a Word document.

That brings us to the cloud providers. Because what they're selling isn't really notebooks — it's the platform around them.

Let's talk about that. Google Colab launched in twenty seventeen and had over ten million users by twenty twenty-four, with more than sixty percent on the free tier. AWS launched SageMaker Studio Lab in December twenty twenty-one as a free tier. Databricks has had notebooks since twenty fifteen. Every major cloud provider now has a notebook offering, and they're all suspiciously cheap or free. What's the actual business model here?

It's a land grab. The notebook is the entry point to the data science workflow. If you get someone comfortable running notebooks on your platform, you're one click away from selling them storage, model training, deployment, monitoring — the whole stack. Colab's free tier gives you a GPU for a few hours. It's genuinely useful for students and hobbyists. But once you outgrow it, the path to Colab Pro at ten dollars a month, and then to Vertex AI Workbench at hundreds of dollars a month, is frictionless.

The notebook as gateway drug.

That's exactly what it is. And it's brilliant. Students learn data science on Colab because it's free, requires no setup, and runs on any machine with a browser. When they graduate and get jobs, they already know the Google Cloud ecosystem. The notebook is the on-ramp to a very expensive highway.

Which means the "free" notebook isn't free. You're paying with lock-in and with data. Colab's terms have been ambiguous at times about whether they train on your data — there's a real privacy consideration if you're working with anything sensitive.

That's one of the major misconceptions. People assume cloud notebooks are better because they're free and convenient. But free tiers come with real costs — data privacy risks, vendor lock-in, and usage limits that can bite you at exactly the wrong moment. If you're analyzing proprietary financial data or healthcare records, running that through a free cloud notebook is probably a bad idea.

Even beyond privacy, there's the egress problem. Once your data is in one cloud, getting it out costs money. The notebook is free, but the data gravity pulls you in.

So the local-versus-cloud decision isn't just about features. It's about control versus convenience. A local Jupyter install gives you full control — your data stays on your machine, you can customize everything, there are no usage limits except your hardware. But you lose collaboration features, you don't get free GPUs, and you're responsible for managing your own environment.

Which for a lot of people is a feature, not a bug. If you're working with sensitive data, "nobody else can access it" is the whole point.

There's been a parallel trend here that's worth mentioning. Alongside the cloud notebook platforms, we've seen the rise of what I'd call notebook-as-a-service companies — Hex launched in twenty twenty, Deepnote in twenty nineteen, Observable in twenty nineteen. These aren't just hosted Jupyter. They're reimagining what a notebook can be.

Observable is the interesting one to me. It's not a Jupyter kernel at all — it's JavaScript-native, and cells automatically re-evaluate when their dependencies change. It actually solves the execution order problem by making the notebook reactive.

It's a fundamentally different model. In a Jupyter notebook, you manage state explicitly by choosing which cells to run when. In Observable, the notebook is a directed acyclic graph — each cell declares its dependencies, and the platform figures out execution order. Change one cell, and everything downstream updates automatically. It's what Jupyter would look like if it were designed today rather than evolving from a Python REPL.

Hex takes a different approach — they've added what they call "app mode," where you can turn a notebook into an interactive dashboard that non-technical stakeholders can use. That's a genuine production pattern. The notebook is the analysis environment, and with one click it becomes the deliverable.

Which answers the question about whether notebooks can serve as production environments for analysis and visualization. The answer is yes, but not in the way most people think. You're not putting the raw notebook in front of executives. You're using a platform that wraps the notebook's logic in a clean interface. The notebook is the engine, not the car.

We've got three tiers now. Raw notebooks — dot ipynb files — which are great for exploration but dangerous for anything else. Platform-wrapped notebooks — Databricks, Papermill — which add enough discipline to be production-viable. And next-generation notebooks — Observable, Hex — which rethink the paradigm to eliminate some of the original problems.

All three tiers are growing. The raw notebook isn't going away, because it's the simplest thing that works for individual exploration. But the center of gravity is shifting toward platforms.

Are cloud-native notebooks now dominating over locally hosted ones? I'd say yes for the broad user base, no for specific high-security or high-performance niches. The student writing their first neural net is on Colab. The researcher sharing a paper's supplementary materials is on Colab or GitHub Codespaces. But the quant fund analyzing proprietary trading data is almost certainly running Jupyter locally, behind a firewall.

There's no public market share data that breaks this down cleanly, but the trend is clear from job postings and conference talks. Five years ago, "knows Jupyter" meant "has Python and Jupyter installed locally." Now it increasingly means "has used Colab or SageMaker." The default has shifted to cloud.

Which has implications for how data science is taught and practiced. If everyone learns on cloud notebooks, the skills around environment management, dependency resolution, and local deployment atrophy. You get data scientists who've never installed a package from the command line.

That's a real concern. But the counterargument is that those skills shouldn't be necessary. If the platform handles environments, the data scientist can focus on data science. It's the same argument people made about managed cloud services versus running your own servers.

I'm of two minds about it. On one hand, abstraction is progress. On the other hand, abstraction without understanding creates fragile practitioners who can't debug when the abstraction leaks. And abstractions always leak.

And the notebook abstraction leaks exactly at the state management boundary. When something goes wrong — when a cell produces unexpected output, or the kernel dies, or a variable isn't what you think it is — you need to understand the execution model to debug it. If you've only ever used notebooks in a managed environment where things mostly just work, that debugging skill might not be there.

Let me pull on a thread you mentioned earlier. You said Databricks Notebooks work in production because the platform enforces execution order. But isn't that just turning notebooks back into scripts? What's the value of the notebook format at that point?

That's a sharp question. The value is the rendered output. A script produces text to standard out or saves figures to disk. A notebook embeds charts, tables, and formatted text inline. When you open a Databricks notebook that ran last night, you see the results immediately in context — the SQL query, then the resulting table, then the chart built from that table, all in one scrollable document. That's useful for understanding what happened.

It's a report that happens to be executable.

And for scheduled data analysis pipelines, that's often exactly what you want. The notebook documents what was done and shows the results, and because the platform ran it deterministically, you can trust that what you're seeing reflects what actually executed.

The notebook's real superpower isn't interactivity — it's narrative. The ability to tell a story with code and results woven together.

I'd say it's both, but the narrative part is what makes notebooks uniquely valuable. There's no other format where you can show your work so transparently. A script says "here's what I did." A notebook says "here's what I did, here's why I did it, here's what happened, and here's what I think about it.

Which is why they've become the standard for sharing research. If you publish a paper with a companion notebook, reviewers and readers can reproduce your analysis — or at least attempt to. Whether they succeed depends on how disciplined you were about execution order.

That brings us to something actionable. If you're using notebooks — and most data scientists are — how do you make sure your work is actually reproducible? There are tools for this. nbqa lets you run standard Python code quality tools on notebooks. pytest-nb lets you test notebooks as part of a test suite. The simplest check is to restart your kernel and run all cells from top to bottom before sharing. If the output changes, you have a state problem.

Kernel restart and run all. It's the notebook equivalent of "did you try turning it off and on again," and it catches an enormous number of issues.

It really does. And yet most people don't do it.

Because it takes thirty seconds and the human brain hates waiting thirty seconds for anything.

That's the deeper tension. Notebooks optimize for speed of thought — instant feedback, no waiting for a full pipeline to execute. But that optimization creates a debt that comes due when you need reproducibility. The lab notebook is quick to write in and slow to clean up for publication.

To answer the core question directly: why wouldn't you just use a repository? Because a repository is optimized for a different thing. Repositories are for building software — version control, modular code, automated testing, reproducible builds. Notebooks are for thinking with data — exploration, visualization, narrative. The two aren't competitors; they're complementary tools for different stages of work.

The boundary between them is where the interesting engineering happens. Netflix extracting notebook logic into Metaflow modules. Databricks scheduling notebooks as production jobs. Hex turning notebooks into dashboards. The question isn't "notebooks or repositories." It's "how do you move work from the notebook to the repository when it's ready, and what infrastructure helps you do that?

There's a broader trend here that I think is worth naming. We're watching the notebook format evolve from a personal tool into a collaborative platform. JupyterLab's extension ecosystem is turning it into something that looks more like an IDE. Real-time collaboration is becoming standard — Google Colab has had it for years, Deepnote built their whole product around it. The notebook is becoming a shared workspace, not just a private scratchpad.

That shift changes who the notebook is for. A private scratchpad is for the individual data scientist. A shared workspace is for teams. The features that matter shift accordingly — version history, commenting, access control, environment consistency across users. These are platform problems, not notebook problems.

Which is exactly why the cloud providers are investing. The platform layer is where the value — and the lock-in — lives.

If I had to predict where this goes, I'd say we're heading toward a bifurcation. The notebook format will persist as a personal exploration tool — JupyterLab running locally, free and open source, no platform needed. And alongside it, we'll see the rise of notebook-native platforms that handle collaboration, scheduling, and deployment, each tied to a specific cloud ecosystem. The format is open, but the workflow is proprietary.

That's already happened, hasn't it? You can export a Colab notebook as a dot ipynb file and run it anywhere. But the collaborative features, the GPU allocation, the integration with Google Drive — those don't export. The notebook is portable; the experience isn't.

That's the business model in a sentence.

Given all of this, what should someone actually do? The prompt is asking for practical guidance, not just theory.

First, use notebooks for what they're good at — exploration, prototyping, and communication. When you're figuring out what the data says, or building a visualization to show a stakeholder, or documenting your analysis process, notebooks are the right tool. Don't apologize for using them.

Second, when something needs to run repeatedly or reliably, move it out of the notebook. Extract the logic into Python modules, put them in version control, write tests. The notebook is the sketch; the repo is the blueprint. Both matter, but don't confuse them.

Third, choose your platform based on your constraints, not on what's trendy. If you're working with sensitive data, run locally. If you need GPUs and collaboration, use a cloud platform, but understand what you're trading away. And if you're building something that non-technical people need to use, look at Hex or Observable — the next-generation tools that treat notebooks as application platforms, not just documents.

I'd add a fourth: test your notebooks. Restart the kernel, run all cells, and verify the output. If you can't do that and get consistent results, your notebook is lying to someone — maybe you, maybe a colleague, maybe a decision-maker who's going to act on your analysis.

That's the lab notebook principle again. A lab notebook that can't be read and understood by someone else isn't a lab notebook — it's a diary. Diaries are fine, but they shouldn't drive business decisions.

Covering the covers.

Just — we've been talking about notebooks for twenty minutes and I'm realizing the whole field has basically reinvented the lab notebook in software form, complete with the same problems. Scientists have been arguing about how to keep proper lab notebooks for centuries. Execution order, reproducibility, narrative versus raw data — it's all there.

The computational notebook is a very old idea implemented with very new technology. Mathematica had it in nineteen eighty-eight. Jupyter democratized it in twenty fourteen. But the fundamental challenge — how do you balance exploration with rigor — hasn't changed.

The cloud providers are betting that the answer is "let us handle the rigor, you focus on the exploration." Which is a compelling pitch, as long as you're comfortable with what they're charging.

The real cost isn't the ten dollars a month for Colab Pro. It's that five years from now, your entire workflow depends on a platform you can't easily leave. That's the lock-in play, and it's working.

The open question, I think, is whether notebooks evolve into full-fledged development environments that replace traditional IDEs for data work, or whether they remain a specialized tool for a specific stage of the workflow. JupyterLab is already blurring this line with its extension system — you can have terminals, file browsers, debuggers, all inside what's technically a notebook interface.

On the other side, VS Code has built notebook support directly into the editor. You can open a dot ipynb file in VS Code and get the cell-by-cell experience with full IDE features around it. So the convergence is happening from both directions.

Which means the notebook might not be a thing at all in a few years — it might just be a mode. A way of working that every development environment supports, rather than a separate category of tool.

That's the most interesting outcome to me. The notebook as a feature, not a product. And if that happens, the cloud platforms' lock-in play gets weaker, because the notebook experience becomes a commodity.

Which is why they're building the platform layer as fast as they can. Get the collaboration, the scheduling, the deployment integrated before the notebook itself becomes interchangeable.

The race isn't about who has the best notebook. It's about who has the best ecosystem around the notebook.

Next time you open a notebook — and I mean anyone listening — ask yourself: am I exploring, or am I building? Because those are different activities, and they deserve different tools and different standards of rigor. The notebook is a phenomenal exploration tool. It's a mediocre construction tool. Know which one you're doing.

If you're not sure, restart your kernel and run all. The answer will reveal itself.

Now: Hilbert's daily fun fact.

Hilbert: In the eighteen eighties, prospectors in the Namib Desert discovered tubes of fused silica glass formed when lightning struck the sand — fulgurites — some of which exhibit a rare optical property called birefringence, splitting light into two polarized rays that travel at different speeds through the glass.

Lightning makes polarized glass tubes in the desert. Of course it does.

I'm going to pretend I knew what birefringence was before this moment.

No you're not.

No I'm not.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop for keeping the show running. If you enjoyed this episode, leave us a review wherever you listen — it helps. We're at myweirdprompts dot com for the full archive. We'll be back soon.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2935: Notebooks vs Scripts: The Real Tradeoffs

Downloads

You Might Also Like

#2935: Notebooks vs Scripts: The Real Tradeoffs