When it comes to writing SQL, it’s a common predicament: You start with something clear and simple that grows and grows into something completely illegible. One subselect becomes a subselect within a subselect, quickly turning into a ball of mud. Enter Common Table Expressions (CTEs).
Benefits of Common Table Expressions
CTEs shine as a way to…
Break down your problem into smaller bits
Keep things things tidy and legible
Reuse resultsets (unlike subqueries)
Think of CTEs as temporary resultsets you can use in the rest of your query. Sometimes they’re called WITH clauses.
Not all databases support CTEs, but the ones that do typically feature a syntax like the following…
A Practical Example
Consider the following use case I encountered recently. We have two tables: deliveries and clients. Clients have many deliveries. We need to figure out how many deliveries each client has had in the last week, broken down by day. So I whipped up a simple query…
In English, we’re asking for delivery counts, broken down by client, for the last week. We get something like this –
Not too shabby for such a simple query. But a couple wild business requirements appear!
We need rows for days where no clients delivered anything (one row per client) – filling the gaps.
If we have more than 5 clients, let’s just grab the top five for the week, and don’t include us (Beekeeper) in the report.
This shouldn’t be too hard, right?
Requirement 1: Filling Gaps
Let’s try to tackle #1 first. We will need to start by considering every date in our daterange, instead of using only dates provided to us in the deliveries table. No easy way to add that dynamically springs to mind, and the best answer that comes up in a google search involves a calendar table. At some point I suppose it usually seems necessary to create one to join to for these sorts of things. But what a pain – let’s just write some SQL!
Let’s see if we can combine a CTE with a nifty Postgres function named generate_series to do exactly what we need.
Great, CTEs and a Postgres function bailed us out. The nice thing about this approach is I can now use the weekdaysToUse resultset for any subsequent CTE and the final query as well, and we’ll definitely need it.
We’re getting closer. Let’s add the joins to deliveries and clients, then run the query and see what we get.
Wait a sec, where are my zero delivery rows? We started the query off with weekdaysToUse, but will only get a row returned where we had at least a delivery from a client for that day. Shoot. Let’s make a query to fill in rows where a client did not have a delivery for that day. If we had that resultset, we could combine it with the results from the previous, and we’d be set.
Maybe we can try to use what we’ve learned so far and build up to it with CTEs.
If we take that resultset and UNION ALL the rows that weren’t zero earlier, we’ll be all set.
Requirement 2: Using Only Top Clients
Remember our business requirement about top clients that aren’t us?
Here is a subquery solution,
and here is a CTE solution.
The CTE allows you to logically structure your thinking, and improves readability 6 months from now when another developer has to grasp what the query you wrote was trying to do. Bonus points for choosing a smart name for the intermediate resultset. Mine is long, but describes exactly what it is – top five clients this week that aren’t us.
Putting It All Together
Now for the big reveal – but first, I wrote a naive subquery solution to the problem to demonstrate the alternative:
Yuck, all the requirements are just jumbled together. I also need to specify the date math and generate series all over the place, since I can’t reuse it without a CTE, and don’t want to scan too much data.
Now, at last, here is our complete solution featuring CTEs.
Straightforward, easy to understand at every step. Not too shabby!
Share this post:
Still Writing SQL in a Terminal?
Beekeeper is a free and full featured SQL IDE with query autocomplete, error detection, dataset previews, and result visualization. We support SparkSQL, Hive, Postgres and more!