Clustered Indexes (WITHOUT ROWID optimization) are a special optimization you can make that could make your queries twice as fast and it could save on disk space as well. Check out this video to learn more about this optimization.
"So, imagine we're writing a SQLite powered blog, and we've got this posts table that looks like this, but we've got a problem right now, which is the fact that we're querying this table by the slug, like this, but the query is slow, because there's no index on this slug column.
Ok, so what should we do to fix this?
Well, obviously we could add an index to the slug column like this, but let me ask you, is this the best optimization we can make?
In other words, does adding this index, cause this query to run as fast as possible?
Well, this might surprise you but, no, this index will help this query immensely, but we can actually do better, in fact there's something we can do, which can make this query run about twice as fast, and it involves a special kind of an index, but before I talk about this special index, I want to make sure you understand how a normal index works.
So, there's a hidden column in this posts table named rowid, and you can actually see this rowid column, if you explicitly add rowid to the select clause.
Now, I didn't add this rowid to the table, it's automatically added by default, to all normal tables.
Ok, so what's the point of this rowid, I mean, what's it used for?
Well, you can think of the underlying storage of our blog post data to logically look something like this.
Basically, the post data is stored in order, by the rowid, and if you know the rowid your looking for, you could query for a post by the rowid, and the query planner will perform an efficient binary search, to quickly find the row your wanting.
Now, would you ever run a query like this?
Well, I would bet that almost nobody runs queries with rowids, mostly because they aren't really meant to be used directly like this, they're meant to be used indirectly.
Let me explain, how rowids are normally used, by looking at the index created with this statement here.
So, when this index is created, logically, you can think about it being stored in the following way.
Basically, the slug values are sorted in alphabetical order, like this, and then each slug has its corresponding rowid, next to it.
So, if you were querying for this slug, the query engine would perform an efficient binary search, to find the slug in the index, then the query engine would use the corresponding rowid, to perform an additional binary search against the table, to locate the underlying row, at which point the row data is returned as a result.
So, as you can see, this rowid is how you go from normal indexes to the row or rows in the underlying table. Now, notice that we don't use the rowid directly, we use it indirectly.
Ok, so in summary, to satisfy this query, two binary searches happen. First, a binary search of the index, then a binary search for the rowid in the table.
So, this is how a normal index is used, to query a value, but this isn't the most efficient way of finding a post by it's slug, so do you have any thoughts on how we could make this query, more efficient?
Well, what if we got rid of the rowid, and instead we ordered the table data by the slug?
Then we wouldn't need to have this separate index, instead, we could just do one binary search, against the table, instead of two, which is about twice as fast, and also since we're not using a separate index, we'd save on disk space as well.
There's a name for this kind of an index, it's called a clustered index, and obviously, you can only have one per table.
Ok, so how would you set this up in SQLite?
Well, here's the original create table statement, and to add the clustered index, we need to do two things:
First, we need to make the slug the primary key, and secondly, we need to add without rowid onto the end of the statement.
Now, by doing this, we've eliminated the separate index, and now the table data is ordered by the slug, so when we run a query like this, the query can be satisfied by a single binary search of the clustered index.
Ok, it's important to note that we're not adding any new capabilities when we use a clustered index, we're simply adding an optimization.
All right, so when should you use this without rowid optimization?
Well, it's usually helpful for tables that have non-integer primary keys or composite primary keys, that don't store large strings or blobs.
Now, the general strategy on using clustered indexes in sqlite is to not worry about it until you're near the end of product development, then you can go back and run tests to see if adding clustered indexes to tables with non-integer primary keys helps or hurts performance.
Hey, here at Mycelial, we're big fans of SQLite, and we're building tools around SQLite that will allow you to synchronize an SQLite database to multiple computers without conflict.
If you are interested in SQLite and what we're doing with it, please subscribe to our channel, and consider joining our newsletter to learn more."