SQLite for beginners: JSON

learn
SQLite for beginners: JSON

Did you know that you can store and manipulate JSON in SQLite?

Watch this video to learn more about how and when to use JSON in SQLite.

Transcript

"In recent years, SQLite has added support for handling JSON inside the database.

Ok, but what does this mean?

Well, SQLite doesn't have as much support for JSON as some of the open-source alternatives like Postgres, but it does allow you to store JSON in text columns, and SQLite gives you the following JSON-related functions and operators, which you can use to manipulate and query the JSON values.

Now, in a moment here, we'll start using some of the JSON-related features in SQLite, but there's a bigger question I want to tackle first, which is: "Why would you want to use JSON in a database?"

Well, I'll start by saying, if you use JSON in your database, you should use it sparingly.

Why do I say this?

Well, for a few reasons. For example, if you resort to storing JSON in your database, you're losing certain features, like foreign keys and other constraints, but probably more importantly, manipulating and querying JSON in the database is often slow and the queries are usually much more complicated than the conventional equivalents.

Ok, so when, if ever, should you use JSON in your database?

Well, one common example is logging of external API calls.

Most APIs return JSON, so it might make sense to log the JSON response as it was returned from the API.

Ok, but beyond logging, when might you want to store your data as JSON?

Well, sometimes it makes sense to use JSON to store sparsely used, attributes about an entity.

For example, I've got an SQLite database, which we'll look at in a moment here, that has an order_lines table that's populated with kitchen cabinet sales information.

Ok, so why are we looking at a kitchen cabinet example?

Well because, kitchen cabinets are usually made to order, meaning that each order is highly customized, and then each cabinet is uniquely built as specified in the order.

So basically, each row in this table stores a cabinet along with its unique set of build attributes.

So the question is, how do we store the attributes?

We could have a column for every possible attribute, but the thing to keep in mind is that many of these attributes are used sparingly. In other words, some attributes might be used just once, and never again, so it doesn't seem like adding a column would make sense for sparingly used attributes.

Ok, so if we're not going to create columns for every attribute, then how do we store these attributes?

Well, this is where storing data as JSON can be a reasonable solution.

In fact let me show you the example I've created for this video.

So, here's a few rows from the order_lines table, and you'll notice at the end here, we see an attributes column, which is just a text column, that stores some JSON.

Now, do you see anything here that bothers you?

Well, what I'm referring to is the fact that every single row, seems to have depth, width and height attributes, which begs the question, should these dimensional attributes, be in their own columns, or is it ok to have them in the attributes column?

A good rule of thumb on this is that you should probably only store JSON attributes if they're sparsely used, but obviously, the depth, width and height attributes aren't sparsely used, they seem to be used on every row, so I think the shape of our table, isn't ideal, but we'll fix that in a moment here.

Ok, so how do you perform queries with JSON values?

Well, let's look at an example.

So let's say we'd like to query for the total quantity of 24-inch deep cabinets that have been sold.

What would this query look like?

Well, we'd select the sum of the quantity from the order lines table, where the depth is 24 inches, but now the question is, how do we set up this where clause to filter by a JSON attribute?

Well, there's a couple of options.

First, we could call the JSON, extract function, passing in the attributes, which is our JSON column, then I'll pass in a valid JSON PATH expression. So I'll key in a dollar sign, which means starting at the root of the JSON document, then I'll key in dot, depth, and then I'll say the result of this expression should equal 24.

And if we run this query, it takes a bit of time, but we get our result.

There's another way to query for the depth, which is by using this double arrow operator.

Basically, I'd say where attributes, then I'll use the operator, followed by a valid JSON path expression, so dollar sign, dot depth, and I'll say the result of this expression should equal 24.

And if we run this query, it still takes a bit of time, at around 7 or 8 seconds, but we get our result.

Now as I suggested a moment ago, the depth, width and height probably should be in their own columns, and not in this JSON attribute column.

So, let's fix this.

First, I'll alter the order_lines table to add the depth, width and height columns.

Next, I'll update the order lines table setting the depth equal to the depth stored in the attributes column, and I'll do similar updates for the width and the height.

Ok, now that we've moved the dimensions into their own columns, we can remove them from the attributes column.

So, to remove the dimensions, I'll say, update order_lines, set the attributes column equal to the result of calling the json_remove function, then I'll pass in the attributes column name, followed by one or more JSON path expressions that specify the specific attributes to remove. So, I'll add an expression for the depth, the width, and the height, then I'll execute the query.

And if we take a quick look at our table, we see that the dimensions exist in their own columns, and the attributes no longer contain the dimensions, cool.

Now, do you remember how I queried for the quantity of 24-inch deep cabinets a few moments ago? If you recall, it took about 7 to 8 seconds to complete.

Let's create this query again, but this time I'll use our new depth column.

Now, when I run this query, it's still pretty slow, as it takes around 1 to 2 seconds, but it was quite a bit faster than the last time we ran it.

So, why was it so much faster?

Well, for a couple of reasons. First, there's more Input/Output involved when you're filtering based on a JSON column, and secondly, there's more CPU used to extract the values we needed to use in the where clause.

Let's perform another query, where we try to find the number of cabinets sold, where the hardware is Brass.

Ok, so that query worked, but it was really slow, so is there anything we can do to improve this query?

Yep, we could add an index to improve the performance.

So, I'll create a new index, on the order_lines table, but instead of indexing on a column, I'll index on the following JSON expression.

Now I'll press enter to create the index.

Next, I'll re-run the query for the brass hardware, and as you can see, the queries performance is greatly improved, though it still does take a bit of time.

Ok, so what's the takeaway here on using JSON in SQLite?

Well, you can use JSON in columns, but you should probably only use them for sparely used attributes, because when you use JSON in your database, you're losing out on many of the native features like constraints, and the queries are generally slower, and more complicated.

So, go ahead and use JSON where it makes sense, but it should probably be one of the last tools you reach for.

Hey, here at Mycelial, we're big fans of SQLite, and we're building tools around SQLite that will allow you to synchronize a SQLite database to multiple computers without conflict.

If you are interested in SQLite and what we're doing with it, please subscribe to our channel, and consider joining our newsletter to learn more."

MORE ABOUT MYCELIAL