SQLite for beginners: Fixing Slow Queries

learn
SQLite for beginners: Fixing Slow Queries

Did you know that SQLite offers profiling tools to help you identify the cause of slow queries? There's even a tool that suggests indexes you should add to improve a query's performance.  Watch this video to learn more.

Transcript

"So, how do you find and fix slow queries in SQLite?

Well, finding poor-performing queries is usually done by setting up good logging, but then the question is after you've identified a slow query, how do you find and fix the root cause of the slowness?

Now, you could manually investigate slow queries, and for seasoned developers, this often works out just fine, because an experienced developer will typically have a good intuition on what to look for.

But, you don't have to manually troubleshoot slow queries, because, well there's some nice profiling tools at your disposal, which can be helpful, especially to SQL newbies and to experienced developers working on more complicated queries.

So what profiling tools does SQLite provide?

Well, one of the most helpful profiling tools is the "explain query plan" command.

So, how do you use the Explain Query Plan command?

Well, you simply put the phrase "explain query plan" in front of a query, just like this.

Let's use this profiling tool against an instance of the northwinds database for SQLite. And by the way, this database we're going to use has no user-defined indexes, so we should expect to see some query profiling problems.

Ok, let's look at some examples.

So, in the terminal, I'll key in sqlite3, the command line utility that comes with SQLite, and I'll specify the northwind database name.

Next, I'll enter the query you saw a moment ago, but I'll prefix by the command with "explain query plan", and when I press enter, we don't see data returned from the query, instead, we see this description of the query plan, shown here.

So, what's this telling us?

Well, for each table or view in a query, you'll see a line like this, that either starts with SEARCH, as you see here, or it will start with SCAN, which you'll see an example of in a moment.

So, what does this term Search mean?

Well, search means that an index is being used, which will give you performant queries, but, a moment ago I said there were no user-defined indexes, so where is this index coming from?

Well, you can find a hint about where this came from, by looking at the name right here.

So, here's the thing, SQLite will autogenerate some indexes, like the one you see here.

Ok, so why did SQLite automatically generate this index?

Well, let's look at the schema for the customer's table, by keying, dot schema, followed by the table's name, customers.

Now, this schema command will show us all the CREATE statements associated with the customer's table.

In other words, it will show us the CREATE TABLE statement that was used to create the table, and it will show us any CREATE INDEX statements for the table if there are any.

Ok, so I'll press enter and we see the create table statement, as expected, but we don't see any create index statements, so, do you have any thoughts on how this auto index was created, given what we see here?

Well, this primary key constraint here, is what caused SQLite to autogenerate this index, which makes sense if you think about it because the only way to enforce a primary key constraint would be to have a unique index on the appropriate column or columns that make up the primary key.

So is there anything we could do to improve this query?

Well, no. We're querying by the customerID, which has this unique index which gives us the most optimized query result you could expect.

Let's run another query against the customer's table, but this time we'll search for the customer by its name.

So I'll enter the query, select star from customers where the company name is 'Island Trading', and I'll add the "explain query plan" in front of the query, and I'll press enter.

So, this time we see the query plan is to do a scan of the customer's table.

So what does Scan mean?

Well, it means there's no indexes available for this query, so to get the result the entire table must be scanned, to find the customer or customers needed.

So, if you see a scan like this, it's a red flag that should draw your attention, as the likely culprit for the slow query.

Ok, so how can we improve this query?

Now, for a lot of you, it's obvious how you could improve this query.

We could just add an index on the company name column, and our slow query problem is solved.

But, sometimes it's not so obvious what indexes you should add to improve a query. And it could be hard because the query is really complicated, or maybe it's harder because you're a SQL newbie.

Wouldn't it be nice if the database could suggest which indexes you should add to make a query more performant?

Well, as it turns out, there is a dot command that can suggest adding indexes to improve a query's performance.

Let's take a quick look at this helpful dot command.

So, first I'll key in the dot command, dot expert, and I'll press enter.

So, what did this do? Well, nothing yet, but watch what happens when I run the query again.

Basically, we see a suggestion here, to add an index on the customer tables, company name column, and if this index existed, then the query plan would be a search that uses the suggested index.

Let's go ahead and add an index as suggested here.

So, I'll create an index named customers, company name, idx on the customer tables, company name column.

Ok, so this index is effectively what was suggested on this line, though I gave it a more meaningful name.

Now, if we analyze the query by the customer's name again, we see that this time the query plan is to search for the customer using our newly created index.

So, what's the performance difference between this scan from earlier and the new search-based query plan, you see here?

Well, a scan is an O of n operation, whereas the search is a log of n operation, so if the table had a million rows in it, the scan will need to look at all one million rows, however, an indexed search would only need to touch about 20 rows to find the result, so the performance implications are huge, and it could be the difference between a query taking a few microseconds, vs many seconds.

In fact, I'll turn on timing, and I'll run the last query again, and we see execution times in micro-seconds, and if we drop the index we just created and run the query again, we see the execution times have increased by a couple of orders of magnitude.

Let's look at a more complicated query, where we join the customers and orders tables, and we're looking for all the orders for this one customer.

Now, when I press enter, we see two lines in the query plan, and let me ask you do you see any potential problems here?

Well, again, any time we see a table scan as we do here, we know were dealing with an O(n) operation, and ideally, we'd like to use an index to achieve log of n performance.

So, what should we do?

Well, if this is a common query, which seems likely, then we should probably add an index, so what index should we add?

Now, we could use expert mode again, to get a suggested index, but lets try and figure out what needs to happen manually, for now.

So, what index should we add?

Well, to answer this, we need to look at the where clause.

So, right here, we're querying for a specific customer in the orders table, which is the problematic part of the query, so we should add an index for the customerid column on the orders table.

So I'll go ahead and create an index on the order tables customerid column, like so.

Now I'll analyze this query again, and this time we don't see any scans which is a good sign, instead, we see two searches which means both tables are utilizing indexes in this query.

You know how a few moments ago, we saw the automatic index, that was related to the primary key? Well, this isn't the only time you might see automatically generated indexes.

For example, I'm going to create two tables, t1 and t2 as follows:

Then I'll do an inner join on these two tables, using the 'a' column from both tables, and I'll prefix the query with 'explain query plan'.

Now, we haven't actually created any indexes on these two new tables, so you probably wouldn't expect to see an index used in this query, but check this out.

Table t1 will do a table scan, as you might expect, but surprisingly, table t2 is using an index that we never created.

So what's with this index? I mean, where did it come from?

Well, if we use the dot indexes command to show the indexes on the t2 table, well, we don't see anything returned, so whatever index this is, it doesn't exist at this point and time.

So what's going on here?

Well, in this particular situation, the query planner notices that this query would result in awful performance if it had to use table scans.

I mean, if you think about this query, finding the solution would result in a nested loop, kind of like this.

In other words, we have to do 1 table scan for table t1, and we have to do N scans of table t2, where N is the number of rows in table t1.

So, what's the big O notation for this inefficient query?

Well, its O of N times M (O(N*M)), where N is the number of rows in table t1, and M is the number of rows in table t2, in other words, this inefficient solution would approximate quadratic performance, assuming both tables have similar row counts, which is really bad performance.

Now, the query planner recognizes how a naive solution, like this, is horrible, and it looks for a better option.

So, what would a better option look like?

Well, the planner could create a temporary index on the t2 tables, "a" column, which takes Mlog(M) time, and then it could use the temporary index to give us a total query complexity of Nlog(M).

So the overall complexity of creating and using the temporary automatic index is Mlog(M) to create the index, plus Nlog(M) to execute the query.

This combined performance is orders of magnitude faster than the naive unindexed solution would be.

So, if you see an automatic index like this, you need to realize that it's generated on the fly, for this one query, which is still rather expensive.

So, just because you see a search with an index, it doesn't mean you're using an efficient query. In fact, in this situation, you should still create a persistent index, so that this relatively expensive on-the-fly indexing operation can be avoided.

So in summary, if you've got a poor-performing query, you can use the explain query plan command, to help you identify where the problem is, so you can address it.

Additionally, you can use the dot expert mode, to get the database to suggest indexes that you should consider adding but keep in mind, expert mode is an experimental feature, and likely won't offer suggestions for all possible queries.

Hey, here at Mycelial, we're big fans of SQLite, and we're building tools around SQLite that will allow you to synchronize an SQLite database to multiple computers without conflict.

If you are interested in SQLite and what we're doing with it, please subscribe to our channel, and consider joining our newsletter to learn more."

MORE ABOUT MYCELIAL