SQLite and the N+1 (no) problem

learn
SQLite and the N+1 (no) problem

If you use a client/server based database (Postgres, MySQL etc) with tools like Object-Relational Mapper (ORM) or GraphQL, then your system is likely susceptible to the N+1 problem. But did you know that not all databases are susceptible to the N+1 problem? For example, SQLite doesn't experience the N+1 problem.

Transcript

"The N+1 problem usually happens, when using tools such as Object-relational mapping (ORMs) libraries or GraphQL, with client/server-based databases.

For example, consider the following snippet that uses an ORM to query some books and their corresponding book reviews.

Basically, in this snippet, we're querying the books on this line, then we're iterating through the books and querying each books reviews here.

So do you see the problem with this snippet?

If you're not sure, then let me give you a hint with the following question:

How many total database queries happen when this entire snippet of code executes?

Well to answer this question, you need to know how many books are queried on this line, because the number of books determines how many queries execute on this line.

The key here is to recognize that this method call to reviews, will get executed once for each book you've queried.

So if 20 books are returned on this line, then 20 queries will happen on this line, one for each book.

So in summary, we get N queries on this line, where N comes from the number of records returned from this 1 query here. Thus the N+1 problem.

Ok, I'm calling this a problem, but what exactly is the problem here?

Well, the problem is that individual queries are relatively expensive, because there is a certain amount of overhead for every query in pretty much all the popular databases, well, except for one, SQLite, but we'll talk more about SQLite in a moment here.

So why are queries more expensive for these databases?

Well, to answer this, I want to create a simple bar chart that represents the amount of time it takes to make a single database query and return its response.

Ok so, what are the component parts that make up this bar chart, in other words, how much of the total query time is spent on CPU time, and how much is disk IO and Network IO?

To approximate how much time each of these types of operations takes, I'll use the psql, command line tool for Postgres, then I'll turn timing on, and then I'll use Explain & Analyze for a simple book query, in a book review database, and down here is the total time to run the query, and over here we see the planning and execution time.

I'll run this same query a couple more times, and it looks like this last run is pretty close to the average of these 3 runs, so we'll use these numbers, to try and get a handle on the query time breakdown.

Ok, so the total query time is 1.248 ms, and if we add these two numbers together we get the amount of CPU and disk time, which is 0.079 ms, or about 6% of the total time, and the remainder of the time, shown as purple here, represents network communication, which is about 94% of the total time.

So as you can see here, the largest share of time used comes from network IO.

So let me ask you this, do you think a query that takes around 1 ms is a big deal?

Well, it depends.

If you're doing just a few queries and each query takes about 1 ms, then it's probably not a big deal, however, if you're susceptible to the N + 1 problem, meaning you use any of the popular client/server databases with an unsophisticated ORM or maybe GraphQL, then this 1 ms query time starts to become a really big deal.

Let's look at an even more problematic scenario, and we'll try and quantify the effects of the N+1 problem.

So, here's a slightly different query than you saw earlier.

And let me ask you, how many total queries would this snippet result in?

Well, you'll get 1 book query from this line, about 20 review queries on this line, one for each of the books, and about 5 author queries for each book review on this line.

If we do a little math, we have one book query, plus 20 review queries, and 20 times 5 author queries, or 100 author queries. So in total, we've got 121 queries.

Now if each query took about 1 ms, then we've got a total time of about 121 ms, just to fetch the data. This isn't good if you want a fast, responsive system, which to most people means, generating and returning a response within 100 ms.

But, this isn't just bad because it results in slow requests, it's also bad because it's likely going to result in your database becoming overwhelmed with queries, that use finite resources, resulting in even worse performance, and depending on your load, you could bring your database to its knees, with queries like these.

Let's take a look at our bar chart again, and let me ask you, what's the biggest contributor to the slowness of this type of query?

Well, obviously the largest portion of time is spent doing Network IO, the next most expensive part is this planning time, and lastly, we've got the execution time, which is actually pretty small in comparison to the other two times.

So let me ask you this, would we still be experiencing the N+1 problem, if we eliminated this networking-related portion of our bar chart?

Well, we'd be left with a planning and execution time of around 79 microseconds per query, and if we took our 121 queries from earlier and multiplied them times 79 microseconds. we'd get a total query time of about 9.5 ms, which would be a totally acceptable amount of time because it means you can return the response within 100 ms.

So is there any way of making this theoretical scenario work?

In other words, is there any way we could eliminate the most time-consuming parts of individual queries?

Well, it would mean removing any network latency between your application servers and your database servers, but in reality, you can't really cut your query time very far, when you're using a client-server database.

I mean, in my example from earlier, I was using Postgres locally on my fast M1 mac, and we still had about 1 and 1/4 ms of latency.

There's just a certain amount of unavoidable latency overhead, associated with client/server-based databases.

The only way to eliminate this latency is to switch from a client/server-based database to an embedded database, and the most popular embedded database is SQLite.

SQLite is a library-based, embedded, relational database that runs in your application process and it doesn't suffer from any network IO-related overhead because it's not using the network.

Now, it's a bit harder to measure execution time for SQLite queries with high precision, so I wrote a simple script to measure query times for the books and reviews, and we're seeing the book query time takes about 8 microseconds, and the reviews query is also taking about 8 microseconds. So even in the worst-case scenario (121 queries * 0.008 ms = 0.968 ms), we're completing all the queries within about 1 ms, which is well inside of our 100ms response time goal.

So as you can see, SQLite isn't really susceptible to the N+1 problem.

Now, some of you might be performance junkies, and you might be wondering if we could improve on this 1 ms time here.

Well, you could improve this number, by reducing the number of queries made.

In other words, consolidating some of the repeated queries into a single query that look something like this.

Ok, so you could optimize things further, and we could get all the data we need in just 3 database queries, instead of the 121 queries, and we'd have an overall query time of around 20 to 30 microseconds, but the question is, would the optimizations be worth doing?

I'd say probably not. I mean, first of all, our goal is to return a response to the client within 100 ms, and 1 ms is almost insignificant when compared to our target response time.

Additionally, if your optimizations result in more complicated code, which it likely would, then it might not be worth doing. I mean, I think generally you should tend to favor simpler solutions, for cognitive reasons, so long as the simple solution has satisfactory performance.

And I mean a, 1 ms query time is more than good enough, so why make your code more complicated just to squeeze out an imperceivable amount of performance gains?

So, how does this unoptimized SQLite solution compare to an optimized Postgres solution?

Well, an optimized Postgres solution would result in 3 queries, which were taking about 1 and 1/4 ms each, so we'd end up with a total query time of about 3.75 ms or longer.

Then, with an unoptimized SQLite solution, you'd have 121 queries, that take about 8 microseconds each, so the total query time would be about 1 ms.

So as you can see, an unoptimized SQLite solution is still about 4 times faster than the optimized Postgres solution.

Hey, here at Mycelial, we're big fans of SQLite for reasons like we just talked about, and we're building tools around SQLite that will allow you to synchronize an SQLite database to multiple computers without conflict.

If you are interested in SQLite, please subscribe to our channel, and consider joining our newsletter to learn more."

MORE ABOUT MYCELIAL