SQL Common Table Expressions (CTEs)

learn
SQL Common Table Expressions (CTEs)

A Common Table Expression (CTE) in SQL is a named temporary result set, derived from a SELECT statement, that can be used within another SELECT, INSERT, UPDATE, or DELETE statement. It is defined using the WITH clause followed by a SELECT statement and a name for the CTE, and can be referred to by name in the main query. CTEs are useful for breaking down complex queries into simpler, more readable parts, and can also improve query performance by reducing the need to repeat subqueries. Want to learn more, check out this video.

Transcript

"Common Table Expressions, or CTEs, look like this.

Basically, they start with the 'with' keyword, and it's followed by a name of your choosing, then inside a set of parenthesis you'll write a SQL statement and then you'll invoke the CTE in a SQL statement like this.

Ok, so what can you put right here?

Well, a select statement of some sort.

So for example, I could include something as simple as select 1.

So what do you think will happen when I run this entire statement?

Well, let's try it out and see.

Ok, so we got a one, which you probably could have guessed.

All right, so is this useful somehow?

Well, it probably doesn't look all that useful yet, but stay with me because CTEs do allow you to solve certain problems in an elegant way, but I'll talk more on this in a moment.

Now, one way I like to think about CTEs is that they can act like a refactoring tool of sorts, in other words, if you have a really complicated query that's hard to understand, well you can use CTEs to refactor the query and make it a little bit easier to read.

Let me show you what I mean with an example.

So, I've got this employee's table, and let's say we need to write a query that returns all the employees that earn less than the average salary.

Ok, so how should we write this query?

Well, I'll write it as follows: select star from employees which I'll alias as a, next we need to somehow calculate the average salary.

Now, there are a few different ways we could go about this, but I'm going to use a subquery. So I'll say, join, then in a set of parenthesis, I'll say, select the average salary, which I'll alias as average salary, from the employee's table, then I'll alias the subquery as b, and lastly, in a where clause, I'll say, where a dot salary is less than b dot average salary.

Now, let me ask you this, how readable is this query.

Ok, this is somewhat subjective, but I think this query is a little hard to read, mostly because of this subquery.

So, what if we used a CTE instead of this subquery, do you think it might improve the readability?

Well, let's try it and see.

So I'll go ahead and add our CTE by keying, with, average salary, as, then I'll cut the subquery here, and I'll paste it into our CTE.

Now I'll add our CTE to our from clause and I'll run our query, and we get the same result as earlier.

Ok, so is this query more readable than the one, that we used a moment ago?

Well, again, this is somewhat subjective, but I think this version is a little bit easier to read, mostly because, I can just look at this part of the query, and I largely know what's going on, but if I want to better understand what this part of the query is doing, I just need to peek up here to see what the CTE is doing.

Now, this example is somewhat simple, but hopefully, you can see how CTEs can make very complicated queries easier to read and understand at a glance.

Now, using CTEs to improve the readability of a query, is nice, but it's not the most interesting thing you can do with CTEs.

The really interesting thing you can do with CTEs is a form of recursion.

Ok, so what does recursion mean in the context of CTEs?

Well, it's recursive because it references itself inside the CTE?

Now, this concept of recursion might seem a bit fuzzy at the moment, but I think a simple example of how to do recursion with CTEs should help you wrap your head around what it is.

Ok, so what's the simple example we'll be looking at?

Well, we're going to use a recursive CTE to generate the following resultset. Basically, we're going to create a result with 10 rows, where each row has a number going from 1 to 10.

So to do this, I'll say, with and I'll use the recursive keyword, followed by a name for our CTE, I'll call this the counter CTE, then I'll say, as, followed by a set of parenthesis.

Now, inside the parenthesis, we're going to write two queries, separated by the union all operator.

We'll call the first query the anchor query, and we'll call the second query the recursive query.

Now, the anchor query, here, is executed first and it's only executed once, to bootstrap the recursive query.

So, what do we want the anchor query to return?

Well, remember, we're wanting a set of rows with the numbers 1 through 10, so our first query should return the number one, so to do this, I'll say, select 1 and I'll give it an alias of num.

Now, the query we add after the union, the recursive query, is different from the anchor query here, in that it recursively queries the CTE. In other words, if I were to say, select num from counter, as the second, recursive query, the num value here, initially comes from this query, so initially the num is 1.

What do you think will happen if I run this query?

Well, let's try it, so first I'll invoke the CTE by saying, select star from our counter cte, and then I'll run the query.

Ok, so what's going on here?

Well, it appears that I created an infinite loop or more accurately an infinite recursion, so I'll go ahead and kill this query.

Ok, obviously that wasn't what we were wanting, but we can get closer to a working solution, by simply incrementing the num value here.

So, what do you think will happen if I run this query?

Well, lets try it.

Ok, well that's slightly better, in that our number is incrementing, but we've still got an infinite recursion.

So, how do we prevent an infinite recursion?

Well, the way to break out of a recursive CTE is by having this second query, the recursive query, return an empty result.

So for example, I could add a where clause to our second query where I say only return a result if the num is less than 10.

So, now when the number is incremented to 10, this where clause will be false and the query won't return a result, which breaks us out of the recursive query.

Now, let's go ahead and try this, and it worked, cool.

All right, this simple example hopefully gave you a basic sense of how a recursive CTE works, but you're probably wondering, when would you typically use a recursive CTE?

Well, recursive CTEs are usually helpful when you're dealing with hierarchical data structures.

So for example, this employee's table we queried a moment ago, has a hierarchy to it. Essentially what we're looking at is an organization chart in a table structure, where Arthur, the CEO is at the top of the chart because he has no manager id, then you've got these three VPs, which report to the CEO, as indicated here by the manager id, and then the rest of these employees, report to the VP's.

Now, let's say we'd like to write a query that shows Randy and all the managers that Randy reports to. In other words, we want to write a query that returns Randy, and then it recursively returns every manager up to and including the CEO.

Ok, so how do we write this recursive query?

Well, I'll start by writing the boilerplate for a CTE, so with recursive, and I'll name our CTE org, then as, followed by a set of parenthesis.

So, what do we need to do next?

Well, we'll start with the anchor query, so what's the first record we need to return from our recursive cte?

Well, it's Randy, so I'll say select star from employees where the name is randy.

Next, I'll add the union all operator, and now the question is, how do we write the recursive query?

Well, we know we've got to query the CTE in this second query, after all, that's what makes CTEs recursive, so I'll say, select star from org, which I'll alias as a.

Ok, what would happen if I invoked this cte query right now?

Well, there's nothing to terminate the recursive query, so it would result in an infinite recursion.

Additionally, this recursive query is just returning randy's row, which isn't what we wanted. What we want to return is Randy's supervisor, so to return the supervisor I'll join the employees table, which I'll alias as b, and I'll join on a dot manager id, equals b dot id.

And lastly, I need to modify this select clause so that it returns the same columns as this anchor query did. So, to do this, I'll simply prefix the star with the alias to the employee table, which is b.

Next, I'll invoke our CTE by entering select star from org and now let's run this query, and we see the result we were looking for, Randy reports to William, and William reports to Arthur, cool.

So let me ask you this, in our query here, how is the recursion being terminated?

Well, it's terminated because the CEO has a null manager id, so it can't be joined with any other employees, thus this entire SQL statement doesn't return a value when the recursion makes its way up to the CEO.

So, in summary, CTEs can serve a couple of purposes:

  1. You can use CTEs as a refactoring tool, to make your queries more readable
  2. And you can use CTEs recursively to query hierarchical data structures, which is a really powerful feature.

Now, one last thing. I used the excellent SQLite database for my examples, but CTEs are available in all the popular relational databases, but there's a small catch. The exact syntax you use for CTEs can vary a little bit between database engines, so you may need to refer to the docs if you're running into syntax errors.

Hey, here at Mycelial, we're big fans of SQLite, and we're building tools around SQLite that will allow you to synchronize an SQLite database to multiple computers without conflict.

If you are interested in SQLite and what we're doing with it, please subscribe to our channel, and consider joining our newsletter to learn more."

MORE ABOUT MYCELIAL