#2501: Describing a Neighborhood: Databases Without Screens

Can you design a relational database using only your voice? We coach a beginner through PostgreSQL from scratch.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2659
Published: Apr 28
Updated: May 15
Duration: 38:27
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: software-development audio-engineering postgresql

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

This episode tackles a deceptively simple challenge: can you design a complete relational database schema using only spoken words, with no visual aids? The answer is yes — and the process reveals just how intuitive good database design can be when you focus on relationships rather than syntax.

Starting from Zero

The journey begins where every database project does: installation. For macOS users, Homebrew offers the cleanest path with brew install postgresql. On Linux, the standard apt approach works, but requires switching to the postgres system user — a security measure that prevents unauthorized access to the database cluster. This initial hurdle (the "bouncer at the door" problem) confuses many newcomers, but once you understand that PostgreSQL trusts only its own system user by default, the logic becomes clear.

The Entity Model

The core of the exercise is identifying the business nouns of a movie theater: movies, screens, showtimes, seats, customers, and bookings. Each of these becomes a table. The relationships between them are where the real design thinking happens. A showtime belongs to exactly one movie and one screen. A seat belongs to exactly one screen. A booking connects a customer to a specific seat at a specific showtime. These are all one-to-many relationships, implemented through foreign keys.

Design Decisions and Tradeoffs

Several important design choices emerge during the schema construction. The screens table stores a capacity column directly, even though capacity could be computed by counting rows in the seats table. This is denormalization — a deliberate tradeoff that trades storage and consistency for query speed. On a small theater with a few hundred seats, the performance difference is negligible. On a stadium with 80,000 seats, that aggregation query becomes expensive.

The showtimes table uses TIMESTAMPTZ (timestamp with time zone) rather than plain TIMESTAMP. This is critical: PostgreSQL stores timestamptz values internally as UTC and converts to the session's time zone on retrieval. A theater in New York showing an 8 PM showtime means different things to customers in different time zones, and timestamptz handles this correctly. Plain timestamp stores "8 PM" with no context — a recipe for the kind of timezone bugs that plague event-scheduling applications.

For ticket prices, the schema uses NUMERIC(6,2) rather than FLOAT. This is because floating-point numbers are approximate — they store values in binary and can introduce tiny rounding errors. For money, exact decimal arithmetic is essential. This is the same reason banks don't use floats for account balances.

The Booking Table: Where Everything Connects

The bookings table ties together showtimes, customers, and seats. It uses a DEFAULT NOW() clause for the booked_at timestamp, automatically recording when each reservation was made. This table represents the culmination of the entire schema — every foreign key in it references a primary key in another table, creating a web of relationships that the database enforces automatically.

Lessons for Beginners

The episode demonstrates that relational database design is fundamentally about thinking in terms of entities and their relationships. Once you can describe those relationships in plain English ("a showtime belongs to one movie and one screen"), translating them into SQL becomes a mechanical process. The specific syntax — SERIAL vs. GENERATED ALWAYS AS IDENTITY, snake_case naming conventions, plural table names — matters less than getting the relationships right. PostgreSQL gives you guardrails: foreign key constraints prevent orphaned records, UNIQUE constraints prevent duplicates, and data types enforce correctness. The database is a partner in keeping your data clean, not an obstacle to overcome.

Mentions

Homebrew Package manager for macOS
MySQL Popular open source relational database
Postgres.app Easiest way to run PostgreSQL on Mac
PostgreSQL Powerful open source relational database system
SQL Server Microsoft's enterprise relational database
SQLite Embedded SQL database engine

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Featured In

Creator's Picks 304 episodes

#2501: Describing a Neighborhood: Databases Without Screens

Daniel sent us this one — another audio-coding challenge. This time it's SQL, specifically PostgreSQL. The idea is we coach a complete beginner, someone who's never touched a database, through designing and querying a relational schema for a small movie theater. Using only our voices. No screen sharing, no diagrams. Same format as the Python and TypeScript challenges.

Oh, this is going to be fun. Databases are actually perfect for audio. Relationships are conceptual — "a showtime belongs to one movie and one screen" — that's a sentence. You don't need to see it.

That's true. It's almost like we're describing a neighborhood. The movie lives on this street, the screen lives on that street, and the showtime is the intersection where they meet. You can picture that without a map.

And by the way, today's script is coming to us from DeepSeek V four Pro.

Alright, let's see what they've got for us. So where do we start with this?

I think we start where the beginner starts — with nothing installed. Let's get PostgreSQL onto their machine first, then we build the whole thing from scratch.

If you're on a Mac with Homebrew, it's brew install postgresql, then brew services start postgresql. If you're on Ubuntu or Debian, sudo apt update, sudo apt install postgresql, then you'll need to sudo dash i dash u postgres to switch to the postgres user. There's also Postgres dot app for Mac which is a one-click install — honestly that might be the easiest path if you're brand new.

I want to pause on that switching-to-postgres-user step, because it confuses everyone the first time. When you install PostgreSQL on Linux, it creates a system user called "postgres" that owns all the data files and has admin access to the database cluster. You can't just run psql as your regular user right away — you have to become the postgres user first. It's a security thing. The database won't trust just anyone who walks up to it.

That's a good callout. It's like the database has its own bouncer at the door, and the only person on the guest list initially is that postgres system user. You can add yourself later with a CREATE ROLE command, but to get started, you borrow the bouncer's ID.

The key thing after installation is you need to be able to type psql and get a prompt. That's the PostgreSQL command-line client. If you type psql and it says command not found, the install didn't finish properly or the path isn't set up.

Once you're in psql, the first thing we do is create a database. The command is CREATE DATABASE cinema semicolon. Type that, hit enter. Then backslash c cinema to connect to it. Backslash c is the psql command for "connect to this database." The prompt will change to show you're in the cinema database.

Quick psql tip here — if you ever forget which database you're connected to, type SELECT current_database semicolon. It'll tell you. And backslash l lists all databases on the server. Those two backslash commands alone will save you so much confusion.

Now before we write a single CREATE TABLE, we need to think about what we're modeling. A movie theater. What entities exist here? What are the nouns in this business?

This is the part I love. Let's think out loud. We have movies — that's obvious. We have screens, or auditoriums — the physical rooms where movies play. We have showtimes — a specific movie playing in a specific screen at a specific time. We have seats — individual seats in each screen. We have customers — people who buy tickets. And we have bookings — a customer reserving a specific seat for a specific showtime.

The relationships between them. A showtime belongs to exactly one movie and exactly one screen. A seat belongs to exactly one screen. A booking links a customer to a seat at a showtime. These are all one-to-many relationships. One screen has many seats. One movie has many showtimes.

That's the whole model. The relationships are the foreign keys. Let's start writing them. I'll narrate the syntax.

Before you do — we should talk about naming. PostgreSQL likes snake underscore case. All lowercase, underscores between words. And table names are plural by convention — movies, screens, not movie, screen.

I've always wondered if there's a real reason or if it's just tradition.

It's mostly tradition, but it does make JOINs read more naturally. You're selecting from movies, joining to showtimes — each table feels like a collection of things. But I've seen singular used effectively too. The important thing is to pick one and stick with it. Nothing worse than a schema where half the tables are plural and half are singular.

Alright, first table. Here we go. CREATE TABLE movies open paren, new line, indent. id SERIAL PRIMARY KEY comma, new line. title TEXT NOT NULL comma, new line. duration underscore min INTEGER NOT NULL comma, new line. release underscore year INTEGER, new line. close paren semicolon.

Let's unpack that. SERIAL is PostgreSQL shorthand. It creates an integer column and automatically attaches a sequence so every new row gets an auto-incrementing number. The PRIMARY KEY part means this column uniquely identifies each row and the database will index it automatically.

There's actually a modern alternative called GENERATED ALWAYS AS IDENTITY, which is the SQL standard way. It's stricter — it prevents you from accidentally inserting your own values into the id column. But SERIAL is everywhere in tutorials and it's simpler to explain. For this episode, we're using SERIAL.

I want to flag for the listener — if you're starting a brand new project in 2026, use GENERATED ALWAYS AS IDENTITY. It's the direction PostgreSQL is heading, and it prevents a really subtle bug where someone manually inserts an id of 5000, and then the sequence is still sitting at 27, and suddenly you get a duplicate key error when the sequence finally catches up.

SERIAL is fine for learning, but the identity column is the grown-up version. And duration_min is an integer. We need to know how long a movie is so we don't schedule overlapping showtimes. It's in minutes. The Matrix is 136 minutes, for example.

Release_year is just an integer. No NOT NULL constraint — we might not know the release year for every movie. And I want to point out something subtle here: we chose INTEGER, not a date type. A release year isn't a date — there's no month or day. Storing it as an integer lets us do things like SELECT all movies from the 1990s with WHERE release_year BETWEEN 1990 AND 1999. If it were a date, that query would be more awkward.

Alright, second table. CREATE TABLE screens open paren, id SERIAL PRIMARY KEY comma, name TEXT NOT NULL comma, capacity INTEGER NOT NULL, close paren semicolon.

Capacity is interesting. We could compute capacity from the number of seats in the seats table — just count them. But we're storing it here too. That's called denormalization. It's a tradeoff. It's redundant, but it's convenient. If you want to know how many seats Screen 1 has, you don't want to count rows in the seats table every time.

We're the ones designing this, so we get to make that call. Let me give a concrete example of when this tradeoff matters. Imagine the theater manager wants a dashboard that shows all three screens and their capacities. If we stored capacity only in the seats table, that dashboard query has to do a GROUP BY and COUNT across all seats every single time it loads. On 260 seats, that's trivial. On a stadium with 80,000 seats, that query starts to hurt. The denormalized capacity column turns a potentially expensive aggregation into a simple column read.

And the downside, of course, is that if you add a seat to Screen 1, you have to remember to update the capacity column too. If you forget, the number is wrong. That's the tradeoff in a nutshell — speed versus consistency.

Third table — seats. CREATE TABLE seats open paren, id SERIAL PRIMARY KEY comma, screen underscore id INTEGER NOT NULL REFERENCES screens open paren id close paren comma, seat underscore number TEXT NOT NULL comma, row underscore number TEXT NOT NULL comma, UNIQUE open paren screen underscore id comma seat underscore number comma row underscore number close paren, close paren semicolon.

That REFERENCES screens id is our first foreign key. It means the database will reject any seat that claims to belong to a screen that doesn't exist. You can't have seat A1 in Screen 99 if Screen 99 isn't in the screens table. The database enforces this.

That UNIQUE constraint at the bottom — that prevents two seats in the same screen from having the same row and seat number. You can't have two A1s in Screen 1. But Screen 2 can have its own A1, because the uniqueness is per screen.

I want to note that we made seat_number and row_number TEXT, not INTEGER. That's because rows are often letters — A through J — and seat numbers might have variants like "1A" or be purely numeric. TEXT gives us flexibility.

Alright, showtimes table. This is a core junction point. CREATE TABLE showtimes open paren, id SERIAL PRIMARY KEY comma, movie underscore id INTEGER NOT NULL REFERENCES movies open paren id close paren comma, screen underscore id INTEGER NOT NULL REFERENCES screens open paren id close paren comma, start underscore time TIMESTAMPTZ NOT NULL comma, price NUMERIC open paren six comma two close paren NOT NULL comma, UNIQUE open paren screen underscore id comma start underscore time close paren, close paren semicolon.

That's timestamp with time zone. This is important. PostgreSQL stores it as UTC internally, then converts to whatever time zone your session is using when you query it. If the theater is in New York and someone checks showtimes from Los Angeles, the times display correctly. If you use plain TIMESTAMP without time zone, you're just storing "8 PM" with no information about which 8 PM. That's a trap.

I've fallen into that trap. I once built an event-scheduling app that stored everything as TIMESTAMP WITHOUT TIME ZONE, and then we had users in four time zones. The bug reports were just... "Why does the meeting say 3 PM? I'm in Chicago, the organizer is in London." We had to migrate every row and add timezone offsets retroactively.

The lesson there is: TIMESTAMPTZ is almost always the right choice for events that real humans attend at a physical location. The only time you might use plain TIMESTAMP is for something like a log entry where the server's local time is all that matters.

NUMERIC six comma two means six total digits, two after the decimal point. So a ticket price can be up to 9999 dollars and 99 cents. The UNIQUE constraint on screen id and start time — you can't schedule two movies in the same screen at the exact same moment. Which makes sense.

I want to highlight why we used NUMERIC and not FLOAT for price. FLOAT is approximate — it stores numbers in binary and can introduce tiny rounding errors. Twelve dollars and fifty cents might become 12.For money, you want exact decimal arithmetic. NUMERIC gives you that. Banks have been burned by FLOAT before.

This one's simple. CREATE TABLE customers open paren, id SERIAL PRIMARY KEY comma, name TEXT NOT NULL comma, email TEXT, close paren semicolon. Email is nullable — maybe someone buys a ticket in person and doesn't give an email.

Now the big one. This is the table that connects everything. CREATE TABLE bookings open paren, id SERIAL PRIMARY KEY comma, showtime underscore id INTEGER NOT NULL REFERENCES showtimes open paren id close paren comma, customer underscore id INTEGER NOT NULL REFERENCES customers open paren id close paren comma, seat underscore id INTEGER NOT NULL REFERENCES seats open paren id close paren comma, booked underscore at TIMESTAMPTZ DEFAULT NOW open paren close paren comma, UNIQUE open paren showtime underscore id comma seat underscore id close paren, close paren semicolon.

Three foreign keys in one table. A booking connects a customer to a seat at a showtime. The UNIQUE constraint on showtime id and seat id — that's the double-booking prevention. You cannot book the same seat twice for the same showtime. The database will throw an error if you try.

DEFAULT NOW — if you don't specify when the booking was made, PostgreSQL fills in the current timestamp automatically.

Now let's talk about why we don't store the movie title on the bookings table. This is the core normalization lesson.

Imagine we did. Every booking row has the movie title. Now the theater decides to rename "The Matrix" to "The Matrix: 25th Anniversary Edition." You now have to update tens of thousands of booking rows. And if you miss one, you have inconsistent data. Some bookings say one title, some say another.

Instead, the movie title lives in exactly one place — the movies table. Bookings connects to showtimes, showtimes connects to movies. If you need the title, you JOIN through those relationships. One update to the movies table, and every query that joins to it sees the new title instantly.

This is what normalization means at the beginner level — don't store the same fact in two places. Every piece of information should have one authoritative home.

I want to push on that a little. You mentioned denormalization with the capacity column earlier. Are we being inconsistent? We said denormalization is okay for capacity, but here we're saying normalization is sacred for movie titles.

That's a fair question. The difference is how often the data changes and what the consequences are. Capacity changes maybe once every few years when they renovate. A movie title could theoretically change, but in practice, it's also rare. The real difference is that the movie title already exists in the movies table — we'd be duplicating it. The capacity is a computed fact that we're caching for performance. The principle is: normalize by default, denormalize only when you've measured a performance problem.

That's a good rule of thumb. Alright, we've got our six CREATE TABLE statements. Let's put some data in. We'll INSERT a handful of rows into each table so our queries return real results.

INSERT INTO movies open paren title comma duration underscore min comma release underscore year close paren VALUES open paren single quote The Matrix single quote comma 136 comma 1999 close paren semicolon.

Let me do a couple more. INSERT INTO movies open paren title comma duration underscore min comma release underscore year close paren VALUES open paren single quote Inception single quote comma 148 comma 2010 close paren semicolon. And one more — INSERT INTO movies open paren title comma duration underscore min comma release underscore year close paren VALUES open paren single quote Everything Everywhere All at Once single quote comma 139 comma 2022 close paren semicolon.

INSERT INTO screens open paren name comma capacity close paren VALUES open paren single quote Screen 1 single quote comma 80 close paren comma open paren single quote Screen 2 single quote comma 60 close paren comma open paren single quote Screen 3 single quote comma 120 close paren semicolon. Three screens, different sizes.

We're not going to insert all 260 seats manually. We'll do a few to have something in the table. INSERT INTO seats open paren screen underscore id comma seat underscore number comma row underscore number close paren VALUES open paren 1 comma single quote 1 single quote comma single quote A single quote close paren comma open paren 1 comma single quote 2 single quote comma single quote A single quote close paren comma open paren 1 comma single quote 3 single quote comma single quote A single quote close paren comma open paren 1 comma single quote 1 single quote comma single quote B single quote close paren semicolon. Four seats in Screen 1.

In a real theater you'd script this — a loop that generates rows A through J, seats 1 through whatever. But for learning, four seats is enough.

Just so the listener can picture it, those four seats are arranged like a tiny grid. Row A has seats 1, 2, and 3 — that's the front row. Row B has seat 1. So if you're looking at the screen, Alice is sitting in the front row, far left. Bob is next to her in seat A2. A3 is empty. And the entire B row is empty except for B1.

We need some that are today, for our later queries. INSERT INTO showtimes open paren movie underscore id comma screen underscore id comma start underscore time comma price close paren VALUES open paren 1 comma 1 comma single quote 2026 dash 04 dash 28 19 colon 00 colon 00 dash 04 single quote comma 12 dot 50 close paren semicolon.

That's The Matrix, Screen 1, today at 7 PM Eastern. The dash 04 is the UTC offset for Eastern Daylight Time. PostgreSQL understands this and converts it to UTC internally.

Let's add a couple more. INSERT INTO showtimes open paren movie underscore id comma screen underscore id comma start underscore time comma price close paren VALUES open paren 2 comma 2 comma single quote 2026 dash 04 dash 28 20 colon 00 colon 00 dash 04 single quote comma 14 dot 00 close paren comma open paren 3 comma 3 comma single quote 2026 dash 04 dash 28 18 colon 30 colon 00 dash 04 single quote comma 11 dot 00 close paren semicolon.

Inception at 8 PM in Screen 2, Everything Everywhere All at Once at 6:30 PM in Screen 3.

INSERT INTO customers open paren name comma email close paren VALUES open paren single quote Alice single quote comma single quote alice at example dot com single quote close paren comma open paren single quote Bob single quote comma single quote bob at example dot com single quote close paren comma open paren single quote Charlie single quote comma NULL close paren semicolon.

Charlie bought a ticket in person, no email. That NULL is fine because we didn't put NOT NULL on email.

INSERT INTO bookings open paren showtime underscore id comma customer underscore id comma seat underscore id close paren VALUES open paren 1 comma 1 comma 1 close paren comma open paren 1 comma 2 comma 2 close paren semicolon. Alice and Bob both booked seats for The Matrix. Alice has seat A1, Bob has A2.

Now we have real data. Let's query it. First query — all movies showing today.

SELECT m dot title comma s dot start underscore time FROM movies m JOIN showtimes s ON m dot id equals s dot movie underscore id WHERE s dot start underscore time colon colon date equals CURRENT underscore DATE semicolon.

Let's break that down. m and s are table aliases — shorthand so we don't have to type "movies" and "showtimes" everywhere. The JOIN pairs up rows from movies and showtimes where the movie id matches. The colon colon date casts the timestamptz to a date — stripping off the time part — so we can compare it to CURRENT_DATE, which is today's date.

If you run that, you'll get three rows — The Matrix, Inception, and Everything Everywhere All at Once, with their start times. All the movies we scheduled for today.

Now here's a beginner trap. If you wrote WHERE s.start_time equals CURRENT_DATE without the cast, it wouldn't work. CURRENT_DATE is a date, start_time is a timestamptz. PostgreSQL would try to compare midnight of today to a specific time, and nothing would match.

That's such a common mistake. I've seen people write WHERE start_time equals CURRENT_DATE and then stare at an empty result set wondering why their data disappeared. The database isn't broken — it's doing exactly what you asked. You asked for rows where the timestamp is exactly midnight, and your showtimes are at 7 PM.

Second query — finding available seats for a specific showtime. This is the one that uses the LEFT JOIN anti-join pattern. Here we go. SELECT se dot id comma se dot row underscore number comma se dot seat underscore number FROM seats se JOIN screens sc ON se dot screen underscore id equals sc dot id JOIN showtimes sh ON sh dot screen underscore id equals sc dot id AND sh dot id equals 1 LEFT JOIN bookings b ON b dot seat underscore id equals se dot id AND b dot showtime underscore id equals sh dot id WHERE b dot id IS NULL semicolon.

That's a mouthful. Talk through it.

We start with all seats. We join to screens to find which screen the seat is in, then to showtimes to find which showtime is happening in that screen — and we specifically ask for showtime id 1. Then the LEFT JOIN to bookings is the key. LEFT JOIN means "keep every row from the left side, even if there's no match on the right." If there's no booking for a particular seat at this showtime, the booking columns are all NULL. Then WHERE b dot id IS NULL filters to only those rows — the seats with no booking.

If you run this for showtime 1, which is The Matrix in Screen 1, you'll see seats A3 and B1 — the two seats Alice and Bob didn't book. Alice took A1, Bob took A2.

This is called an anti-join. Beginners often try something like WHERE seat underscore id NOT IN open paren SELECT seat underscore id FROM bookings WHERE showtime underscore id equals 1 close paren. That works, but the LEFT JOIN IS NULL pattern is often faster in PostgreSQL. The query planner knows how to optimize it.

There's another reason to prefer LEFT JOIN over NOT IN. If your subquery returns any NULLs, NOT IN can give you surprising results. If there's even one NULL in that list, the whole NOT IN evaluates to unknown, and you get zero rows back. It's a gotcha that has burned countless people.

Oh, that's nasty. So NOT IN with a subquery that might contain NULL is basically a landmine.

LEFT JOIN IS NULL doesn't have that problem. It's explicit about what you want.

This is a great illustration of INNER JOIN versus LEFT JOIN. INNER JOIN says "only give me rows where there's a match in both tables." LEFT JOIN says "give me all rows from the left table, and fill in NULLs from the right table where there's no match." The available-seats query needs LEFT JOIN because we want all seats — the ones with bookings and the ones without.

If you used INNER JOIN here, you'd only see seats that have bookings. The exact opposite of what you want. You'd be showing the unavailable seats instead of the available ones.

Third query — total tickets sold per movie this week, ordered by most popular. This is our GROUP BY and aggregate. SELECT m dot title comma COUNT open paren b dot id close paren AS tickets underscore sold FROM movies m JOIN showtimes s ON m dot id equals s dot movie underscore id JOIN bookings b ON b dot showtime underscore id equals s dot id WHERE s dot start underscore time greater than or equals date underscore trunc open paren single quote week single quote comma CURRENT underscore DATE close paren GROUP BY m dot id comma m dot title ORDER BY tickets underscore sold DESC semicolon.

Date_trunc with 'week' gives us Monday of the current week. So this filters to showtimes from this Monday onward. COUNT of b dot id counts how many bookings exist — but only non-null IDs. If a showtime has no bookings, COUNT of b dot id returns zero for that group.

Here's a critical beginner error to preempt. GROUP BY collapses rows. Every column in your SELECT clause must either be in the GROUP BY clause or be wrapped in an aggregate function like COUNT, SUM, AVG. If you put m dot title in SELECT but not in GROUP BY, PostgreSQL will throw an error. MySQL might let you get away with it, but PostgreSQL is strict.

Which is actually a good thing. MySQL's lax behavior can hide bugs. PostgreSQL forces you to be explicit about what you're grouping.

I want to give a concrete example of why MySQL's behavior is dangerous. Imagine you have a query that groups by movie ID but selects the title without putting it in GROUP BY. MySQL will just pick an arbitrary title from the group and show it to you. It won't error. You might get the right answer 99 times out of 100, and then one day you get a wrong title and never notice. PostgreSQL refuses to let you write that query. It's protecting you from yourself.

Also, WHERE versus HAVING. WHERE filters rows before grouping. HAVING filters groups after grouping. If you want to say "only show movies with more than 10 tickets sold," that's HAVING COUNT open paren b dot id close paren greater than 10. You cannot put that in WHERE — the aggregate hasn't been computed yet when WHERE runs.

That's a great distinction. WHERE is the bouncer at the door deciding who gets into the party. HAVING is the judge at the end deciding which groups make the podium. Different stages, different rules.

Fourth query — this is the climax. A window function with a CTE. We're going to rank movies by total revenue.

WITH movie underscore revenue AS open paren SELECT m dot title comma SUM open paren s dot price close paren AS total underscore revenue FROM movies m JOIN showtimes s ON m dot id equals s dot movie underscore id JOIN bookings b ON b dot showtime underscore id equals s dot id GROUP BY m dot id comma m dot title close paren SELECT title comma total underscore revenue comma RANK open paren close paren OVER open paren ORDER BY total underscore revenue DESC close paren AS revenue underscore rank FROM movie underscore revenue ORDER BY revenue underscore rank semicolon.

That WITH clause is a CTE — a Common Table Expression. It's basically a named subquery. You define it once at the top, then you can reference it as if it were a table in the main query. It makes complex queries readable.

The RANK function is a window function. It assigns a rank to each row based on the ORDER BY inside the OVER clause. Unlike GROUP BY, window functions don't collapse rows — every row keeps its identity, and the rank is added as a new column. If two movies tie for revenue, they get the same rank, and the next rank is skipped. So you might see ranks 1, 1, 3. DENSE underscore RANK doesn't skip — it would give you 1, 1, 2.

Let me give an analogy for window functions, because they're weird the first time. Imagine you're in a classroom, and the teacher says "everyone stand up, but stay at your desk." GROUP BY would be "everyone go stand in groups by height." Your individual identity disappears into the group. A window function is more like the teacher walking around and putting a sticky note on each desk with your rank in the class. You're still you, you're still at your desk, but now you have a new piece of information attached.

You keep your row, you just get an extra column of computed context.

Window functions feel like magic when you first encounter them. You're doing a calculation across a set of rows without losing the individual row detail. It's genuinely one of the most powerful features in SQL.

A beginner who's followed along with this episode has now written a CTE with a window function. That's advanced PostgreSQL, and it happened in what, an hour?

Let's talk about a few more concepts we've been using implicitly. Every PRIMARY KEY column is automatically indexed. That's why lookups by id are fast. If you frequently query by start_time — say, "find all showtimes starting between 6 PM and 9 PM" — you'd want to create an index on that column. CREATE INDEX idx underscore showtimes underscore start underscore time ON showtimes open paren start underscore time close paren. An index is like the index at the back of a book — it tells the database exactly where to find the rows without scanning the entire table.

The tradeoff with indexes is worth mentioning. Every index makes writes slower. When you INSERT a new showtime, PostgreSQL has to update the table and all its indexes. So you don't just index every column. You index the columns you actually search on. It's a classic read-versus-write tradeoff.

Foreign keys aren't just documentation. They're enforced constraints. If you try to insert a booking with a showtime id that doesn't exist, PostgreSQL rejects it. If you try to delete a movie that still has showtimes referencing it, PostgreSQL rejects that too — unless you set up ON DELETE CASCADE, which would automatically delete the showtimes. We didn't do that here because you probably don't want to accidentally wipe out your booking history.

Imagine a theater manager deletes "The Matrix" from the movies table because they're not screening it anymore, and CASCADE silently wipes out every booking that ever referenced a Matrix showtime. That's historical sales data, gone. For financial records, you almost always want the default behavior — reject the delete and make the user explicitly clean up the dependencies first.

Let's also mention quoting. This is the number one beginner mistake in PostgreSQL. Single quotes for string values — 'The Matrix'. Double quotes are for identifiers — table names, column names. If you write "The Matrix" with double quotes, PostgreSQL thinks you're referring to a column named The Matrix, and it will error because that column doesn't exist.

I've done this so many times. You're typing along, feeling confident, and then — error: column "The Matrix" does not exist. And you stare at it thinking, "I'm not trying to reference a column, I'm giving you a value!" But the double quotes tell PostgreSQL otherwise. Single quotes for values, double quotes for identifiers. Or better yet, just use single quotes for values and avoid double quotes entirely by never creating tables or columns with uppercase letters or spaces.

Every statement in psql ends with a semicolon. If you forget it, psql thinks you're still typing. The prompt changes from equals hash to dash hash, waiting for more input. Just type a semicolon on the new line and hit enter.

Another one — ambiguous column names. When you join movies and showtimes, both tables have an id column. If you write just "id" in your SELECT, PostgreSQL doesn't know which one you mean. You need table aliases — m dot id, s dot id.

That error message is actually pretty clear — "column reference id is ambiguous" — but when you're new, it's intimidating. The fix is always the same: prefix the column with the table alias.

Alright, let's step back and appreciate what we've built. Six tables, properly normalized, with foreign keys enforcing referential integrity. We can find movies showing today, check seat availability, aggregate ticket sales, and rank movies by revenue. All in PostgreSQL, all from the command line.

The whole thing was done in audio. No diagrams, no screen sharing. Just talking through the relationships.

That's the thesis of these challenge episodes. You don't need to see a database schema to understand it. You need to hear the relationships. A showtime belongs to one movie and one screen. A booking links a customer to a seat at a showtime. Those are sentences. Once you can say them, you can model them.

There's a deeper point here about how we learn technical concepts. So much of programming education is visually oriented — diagrams, code on screens, videos. But databases are fundamentally about language. They're about naming things and describing how those things relate. That's an auditory, verbal skill. You can absolutely learn it through conversation.

Now: Hilbert's daily fun fact.

The shortest war in recorded history was the Anglo-Zanzibar War of 1896, which lasted between 38 and 45 minutes.

That's shorter than this podcast episode. So what should our listener actually do after this episode? First, install PostgreSQL. Get psql working. Create the cinema database. Type out those six CREATE TABLE statements. Type out the INSERT statements. Run the queries. Break things on purpose — try inserting a booking for a seat that doesn't exist and watch PostgreSQL reject it.

Then change the schema. Add a genres table and a movie_genres junction table. Add a screenings table that separates the concept of "this movie is playing" from "this specific showing at this specific time." Experiment with indexes — create one on start_time and see if your date-filter queries get faster with EXPLAIN ANALYZE.

The real learning happens when you start modifying. What if you want to track cancellations? Add a cancelled_at column to bookings. What if you want loyalty points? Add a points column to customers. Every real-world feature is just another column or another table.

The concepts here — normalization, foreign keys, joins, grouping, window functions — these transfer to every relational database. MySQL, SQLite, SQL Server. The syntax varies slightly, but the ideas are identical.

One forward-looking thought. The database you build for a small movie theater is the same shape as the database for an airline reservation system. Seats become seats on a flight. Showtimes become flights. Screens become aircraft. The patterns scale. Learn them once, and you see them everywhere.

That's the thing about relational modeling. Once your brain learns to see entities and relationships, you can't unsee them. You'll walk into a coffee shop and start mentally modeling the orders table. You'll book a hotel room and think about the foreign key from reservation to room. It's a curse and a superpower.

Thanks to our producer Hilbert Flumingtop for keeping us on the rails. This has been My Weird Prompts. Find us at myweirdprompts dot com, or search for us on Spotify.

Go build something.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2501: Describing a Neighborhood: Databases Without Screens

Mentions

Downloads

You Might Also Like

Featured In

#2501: Describing a Neighborhood: Databases Without Screens