Here’s an exercise for you. Go to the Craft of Emacs website and type the word “list” into the search bar.
You might think you’ve executed a perfectly ordinary text search, the sort you can do on Medium, or Hacker News, or Wikipedia, but obviously simpler to account for the smallish content on the site.
You’d be wrong.
You’d still be wrong.
If you were really competent with dev tools, and dedicated enough to pour over
sqlite.wasm file, and if you were
really ahead of the web development curve
you’d know what
wasm actually meant…
And you’d be right.
In making that search, you’ve just spun up a light, but very powerful database. In your browser.
The trouble with text
For those of us who missed the invite to the web search party, an explanation is in order.
Designing a full-text search is hard. You need to store your entire corpus of text, potentially millions of words, split it into tokens, and then have a way of finding groups of those tokens that match a search term. Your method must cope with the intricate complexities of the English language, its plurals and punctuation, and to match multiple search terms in a way that gets accurate results. And it needs to be fast.
Designing a full-text search is hard, but it’s well understood in the space of data storage. There are off the shelf technologies like Elasticsearch which is well, you know, for search, and most databases will have a full-text search feature, (FTS because we love our acronyms).
If you want to search a corpus, you only need to pick your favourite flavour of data store, slather it on a server, sprinkle it with text, and let its own search feature do the rest.
What data store you choose depends on your desires. Do you dream of creating a website as large as Wikipedia? Or to support thousands of searches simultaneously? To add text to your corpus as quickly as you can type it?
For my own humble aspirations, I’d like a store that scales, but only a little, that can be rebuilt quickly and easily each time I write and, since its exposed and vulnerable on the web with all sorts of malicious characters typing and being typed into the search bar, one that’s safe.
A local flavour
My own comfort food is SQLite.
SQLite is, as its name suggests, a lightweight SQL database. It doesn’t scale like Elasticsearch does. It can only live on a single machine, backed by a single simple file, and scales just as a file scales.
Don’t let its unassuming nature fool you. It’s not distributed, and therefore less complex and more versatile for an application than any heavyweight store could be.
It’s the data store your operating system uses if you’re reading this from a phone, and the data store your browser uses even if you’re not.
As for how it works: think of it less a database and more as a library for incredibly sophisticated file manipulation. Whenever you store a bookmark in Firefox, it calls some SQLite code for persistence. The SQLite code, crafted in C, is embedded; weaved into the application. Since the code is part of its own, Firefox doesn’t concern itself with calling out to another process, or need worry about handling an asynchronous response.
In browser country
But there’s a limit to where SQLite can be used. It’s written in C, and so whatever ecosystem you call it from needs to support C bindings. And while C is a lingua franca — any language worth its salt must bind to C eventually — the web notoriously cannot support it.
That is, you were before the advent of WebAssembly.
Through their own efforts, the SQLite team have wrangled with the
compiler toolchain to generate a WebAssembly binary. That
sqlite.wasm file, only
780KB large, contains the whole of SQLite, or at least everything you
need of it to run a full-text search.
Creatively, it uses your browser’s own storage as a filesystem. Its your exclusive database: you aren’t sharing it with anyone else. I’m not tracking whatever you search for — I can’t — and you can’t, through malicious queries, harm anyone but yourself.
Is it performant? Worth the effort? Widely supported?
I’m an enthusiast writing a small website. At this point, I’m not concerned with those questions. But as a fond supporter of SQLite, who has been watching WebAssembly from the sidelines, I can say that SQLite WASM just is.
It is in my browser, and now in yours too.