GlareDB | Compiling GlareDB to WebAssembly: A Peek Under the Hood

In my previous blog post, I started to share a bit more about GlareDB's new execution engine and some of the motivations behind it. In today's post, I want to highlight an additional goal we had: compiling GlareDB to WebAssembly.

What is WebAssembly and Why it Matters

Normally, when you think about developing applications for the browser, you're thinking about JavaScript (and all the other languages that "transpile" down to JavaScript). WebAssembly offers an alternative.

WebAssembly is a binary format that enables high-performance execution of code in web browsers. Many major languages like C, C++, and Rust can target WebAssembly for compilation, allowing developers to bring high-performance applications to the web.

When we decided to rewrite our execution engine, a major goal was to ensure it could be compiled to WebAssembly. Having a self-contained database engine in the browser gets us closer to "serverless" analytics. Without needing to download anything, I can run analytics on S3-hosted Parquet files, or query open datasets over HTTP. These possibilities were important enough to shape the architecture and development process of our new engine.

Every feature we develop will treat WebAssembly as a first-class target. I want the experience of using GlareDB compiled to WebAssembly to match the experience of running GlareDB locally.

Shaping the Architecture

A motivating query for this discussion could be a simple aggregate on a Parquet file in S3:

SELECT avg(salary) FROM 's3://glaredb-public/userdata0.parquet';

While basic, it clearly shows the two high-level steps in query execution:

Fetching/loading the data, and
Executing operations on the data.

These seemingly simple steps require significant engineering work when building a system that runs both natively and in WebAssembly. Unfortunately, WebAssembly comes with quite a few restrictions that requires quite a bit upfront planning. These range from missing threading primitives to not allowing blocking code. Even functions that exist and compile using WebAssembly as target aren't guaranteed to work.

So, we needed to build proper abstractions around this.

Abstracting the Runtime

GlareDB has a concept of "runtimes," which define how queries are executed and how data enters the system. Queries are executed by something called a PipelineRuntime, which handles all the CPU-heavy work. Data is fetched via a FileSystem abstraction, whether from local files or object storage.

When executing on your laptop, GlareDB uses a multi-threaded PipelineRuntime. Any network I/O happening through the FileSystem runs separately within Tokio, ensuring that I/O doesn’t block query execution.

But... we couldn’t just compile that to WebAssembly. There are no threads, so multi-threaded execution is out. And we have to avoid blocking. So what do we do?

First, let’s step back and look at what PipelineRuntime actually is. When a SQL query comes in, GlareDB parses and plans the query, producing a collection of PartitionPipelines. Each one is a small chunk of the query, and the PipelineRuntime is responsible for executing them.

The key component with this is that we don’t need to track the specific order of execution. If a PartitionPipeline isn’t ready to run (e.g. it depends on the output of another), it simply registers itself to re-execute once data becomes available.

This means we can do something a bit whacky: we spawn a JavaScript Promise for each PartitionPipeline. We use wasm_bindgen_futures to convert a Rust Future into a JavaScript Promise. These Promises run on the current thread and-- importantly--they don’t block.

For network I/O inside of a FileSystem, we simply hook into the browser’s network stack, which is handled automatically for us thanks to Reqwest.

Now, this does mean that unlike our native runtime, running GlareDB in WebAssembly is implicitly single-threaded, with CPU and I/O work interleaved. This is a current limitation, and we’re actively working on addressing it.

Dependencies Matter

GlareDB is written in Rust, which means we get to use cargo as our dependency manager. Being able to easily install and manage dependencies is great, no matter the language.

But there is a downside to it being too easy.

Some dependencies simply don’t work with WebAssembly, or require significant effort to get there. Maybe a library relies on a standard library function that panics because there’s no sane implementation for WebAssembly. Maybe it depends heavily on Tokio and spawns background threads. Or maybe it pulls in a system dependency that can't be compiled for WebAssembly.

Our solution has been to be incredibly selective about which dependencies we bring in. It means we occasionally miss out on great libraries, but the tradeoff is worth it--a portable system with a well-understood, reliable dependency graph.

A Unified Shell

A recurring theme in this post has been ensuring a consistent user experience-- and that includes the GlareDB Shell. Whether you’re using the CLI or running it in the browser on the homepage, it’s the same shell.

To make this possible, we had to pass on almost every existing shell or line editor library in Rust, since they all assume they’re running in a real terminal. So instead, we wrote our own--one that doesn’t assume a terminal, and instead relies purely on reading and writing bytes.

There’s still some polish to come, like adding command history and fixing a few rendering bugs, but we think the effort to provide a consistent, portable interface is absolutely worth it.

Future Work

We’re far from done. There’s still significant work happening within the core engine, as well as WebAssembly-specific improvements (like better shell rendering).

Every format or external catalog we plan to support, we also plan to support in WebAssembly. Including scanning Delta Tables or connecting to Iceberg catalogs.

In the meantime, feel free to try it out on our homepage. And report any issues you find on our GitHub!