SQLite for beginners: Vacuuming

learn
SQLite for beginners: Vacuuming

SQLite doesn't automatically free up disk space as it becomes available. In other words, if your database is 1 GB in size and you delete some or all of the data, in the database, the final size of the database will still be 1 GB.

So how do you free up unused space in SQLite?

You need to vacuum your database.

So, how do you vacuum your database?

Watch this video to learn more.

Transcript

"SQLite databases automatically grow in size as needed, which probably seems pretty obvious and intuitive, but what isn't so obvious, is how SQLite databases shrink in size when you delete things.

Let me give you an example, of what happens when you delete things in SQLite.

So, I've got this SQLite database, which you see right here, that's just shy of 600 MB in size.

I'm going to open this database in the command line, and I'll delete all the records from the contacts table where the state is Texas.

Now, just so you know, this database only has one table, the contacts table, and the contacts that we just deleted, represent about 5% of the total contacts.

So, basically, we just deleted about 5% of the rows in this database.

All right so, how much disk space do you think was freed or recovered, by the deletion I just did?

Well, let's take a look.

Hmm, well, the file size hasn't changed at all, so what's going on here?

Well, here's the thing, SQLite by default, doesn't free up disk space, when you delete records, what happens is, that SQLite will delete the data from the underlying database pages and then it will mark any unused pages as free, but it doesn't actually free up any file space.

So, is this a problem?

Well, it depends on your usage patterns and the production environment.

In other words, if your app does a large amount of deletions, your database file could be significantly larger than it has to be, but the question remains, is this actually a problem?

Well, if you've got plenty of disk space, then I wouldn't worry too much about freeing up more disk space because you don't need it.

So, does this mean that the space used for contacts I just deleted is wasted?

No, nothing is wasted, what happens when rows are deleted is that the underlying database pages that were used to store the data get marked as free or available and they will be reused, when more storage is needed.

Ok, but some of you are probably wondering, what if you delete lots of data from your SQLite database, and/or what if you are low on disk space, is there a way of recovering that freed up storage space?

Yes, there is a way of shrinking your SQLite database file, and the process is called vacuuming, but you need to know something about vacuuming, which is the fact that the process of vacuuming your database is very expensive.

So, what do I mean by expensive?

Well, when you vacuum an SQLite database, the vacuuming process requires an exclusive lock on the database, and it often takes a non-trivial amount of time to complete, so to vacuum, you need some amount of downtime, and often vacuuming will take minutes or tens of minutes to complete, depending on the size of your database.

Now, downtime could be a deal breaker for you, so, what if you just don't vacuum, is that ok?

Well, if you're not concerned about disk space, then you can largely get by without vacuuming, and things generally work fine, except for one small problem, which is the fact that the data in your SQLite database can become fragmented, over time, especially if you do a lot of inserts, updates and deletes.

Ok, so what do I mean by fragmentation in SQLite?

Well, ideally SQLite would store related data, contiguously in the database, as much as possible, to optimize IO, but over time, data can become scattered across pages in the database file, and it can have a noticeable impact on query performance.

So, what kind of performance impact are we talking about?

Well, in the worst-case scenarios, a query could slow down by a factor of up to 4 times, so we're not talking about orders of magnitude, but it's enough of a slowdown, that you probably shouldn't ignore it.

So, how do you know if your database is badly fragmented?

Well, I don't know of any tools that definitively show how fragmented your database is, the best advice I've seen is to set up logging so you can track query performance over time, and if the logged query times reach a degradation threshold, then you can consider performing some maintenance, to essentially defragment, your database pages.

All right so, how do we defragment the database?

Well, as you might have guessed, you vacuum the database, to defragment it, with the following pragma command.

Ok, so how does vacuuming work?

Well, at a high level it works like this:

First, you initiate a vacuum by issuing the vacuum command, then SQLite creates an exclusive lock on the database, then it creates a new temporary, empty database file, next it reads the original databases schema, and it creates the tables, indexes and so on, in the new database.

Then, the vacuuming process copies the contents of the original database into the new database, and while doing this, it effectively defragments the database, and of course, it frees up any unused space.

Then finally, SQLite delete's the original database and replaces it with the new one.

Ok, so at this point, I've shown you two of the reasons you'd want to vacuum your database:

  1. To free up space
  2. And to defragment the database

Now, there's a third scenario where you'd need to vacuum, which is after making certain configuration changes.

For example, if you change the page_size, with the following pragma command, this change would only work if the database was empty beforehand, and if the database isn't empty when changing the page_size, then you have to vacuum the database, to apply the configuration change, which effectively creates a new database from scratch, with the new page size applied.

Now, there's an alternative to performing manual vacuums, which is auto_vacuuming, well kind of, what I mean is, auto_vacuuming does part of what vacuuming does, but you'll see what I mean by this in a moment here.

Ok, so what does auto_vacuum do?

Well, as I just said, it doesn't do everything that vacuuming does, in fact, auto_vaccuum is only focused on freeing up disk space, and it doesn't help defragment at all, in fact, auto_vacuuming could actually cause worse fragmentation, but I'll talk about this in a moment.

So how does this auto_vacuum PRAGMA work?

Well, there are three settings you can use for auto_vacuum.

None, which is the default and then full and incremental mode.

Let's look at how full mode works first.

When auto-vacuum is set to full mode, then after every transaction, any free or unused pages are moved to the end of the database file, and then the database file is truncated, effectively removing the unused pages, and freeing up some disk space.

So, if you use full auto-vacuum mode, your database file will stay as small as possible, which seems nice, but do you see any potential problems with what I've just described?

Well, you might get a bit of extra disk thrashing, as pages are continuously added and removed while executing queries, which could lead to further disk fragmentation.

And additionally, each transaction in the database can take a bit longer than necessary, because at the end of each transaction, the database connection, potentially has to do a bit of auto_vacuum-related cleanup work.

Now, the other auto_vacuum setting is Incremental mode.

So how does the incremental mode work?

Well, in incremental mode, the information needed to perform auto-vacuuming is stored in the database file, in other words, the database keeps track of free pages, but auto-vacuuming doesn't occur automatically.

Now, to actually initiate vacuuming, while in incremental mode, you've got to issue the pragma command, incremental vacuum, which can optionally take a parameter that represents the maximum number of pages to be removed from the freelist.

So, now you've got a sense of how to vacuum and auto_vacuum, but the question you're left with is, what vacuuming strategy should you use?

Well, the first thing I'd suggest is to start routinely monitoring certain key measures.

So, what should you monitor?

Well, a couple of valuable things to measure are:

  • The page freelist_count, which you can access with this command, gives you the number of free pages in the database.
  • The next thing you should measure is the total page count, which you can access with this command.

So, what do you do with these two measures?

Well, you might only consider performing a vacuum, if the free page count is a significant percentage of the total page count.

But, if the free page percentage remains pretty small then there's a good chance you shouldn't worry about vacuuming to shrink the database file size, because there's probably not a lot of space to be recovered.

Ok but, as I said earlier, the second reason for vacuuming is to defragment the pages, which can improve query performance.

So, how do you know if you should vacuum for performance reasons?

Well, as I said earlier, there isn't a tool for determining how fragmented your database might be, so instead, you'll have to monitor query performance by logging execution times, so you can detect query performance degradation, over time.

Then, if your queries start to slow down, with logging, you can quantify how much slower your queries are running, and you can then decide if vacuuming makes sense, for performance reasons.

So, now the question is if you're going to vacuum, when should you do it?

Well, you'll have to figure out when to vacuum, based on your environment, but some options to consider are:

  1. Vacuuming at startup
  2. or at shutdown
  3. or during idle periods
  4. or user-initiated via a menu option
  5. or when applying updates
  6. or maybe after hours during a maintenance period

Hey, here at Mycelial, we're big fans of SQLite, and we're building tools around SQLite that will allow you to synchronize an SQLite database to multiple computers without conflict.

If you are interested in SQLite and what we're doing with it, please subscribe to our channel, and consider joining our newsletter to learn more."

MORE ABOUT MYCELIAL