Workload Sharding
In v0.8.0, we added a few features to GlareDB to support what I've been thinking of as "workload sharding." This is a feature that grew out of some work we were doing internally using GlareDB, and since we found this pattern useful, we thought you might as well.
Here's the situation: you're running multiple copies of a stateless application that needs to periodically process some data from GlareDB, and you want to be able to ensure that each of the application servers can process an equal share of the data and that all the data is only processed once.
There are lots of ways, in theory, to solve this problem, such as a shared queue, a distributed lock, or a communication protocol between the applications. But all of these are complex and hard to get right or require additional infrastructure to support the application tier. We could imagine some range based partitioning solution, where each application server gets its own range of values in the result set. This would have no extra infrastructure, but these schemes are also difficult because it is hard to choose ranges that evenly distribute the data and challenging to manage the assignment of ranges to application servers.
Consider the following:
-
compute a hash for a field in the dataset with high cardinality: a user id or other internal identifier, timestamp, or similar. Render the hash into an unsigned 64 bit integer.
-
modulo the hash value by the number (
n
) of divisions or workers you want to have. Every record or result will have one value ofn
values[0-n]
. -
every worker should have an ID in the set
[0-n)
, and you can select results based on the modulo of the hash.
In another form, the operation looks something like:
hash(value) % num_workers == worker_id
This has a few good properties:
- even distribution of work among the workers with fairly effective shuffling of the work,
- no coordination between workers is required, and aside from knowing their ID, individual workers can be very oblivious to the sharding, and
- if the value you hash is stable (like an account id), then the assignment of work to workers doesn't change over time.
There are also a couple of caveats:
- changing the size of the worker pool gets a bit harder than it would be otherwise, since adding or removing a worker it impacts the configuration of all workers.
- "pinning" work to specific workers means that problems with specific workers will always affect the same portion of the workload. If one worker crashes or has a hardware bug, then the work won't be redistributed.
These properties make sense. It's not the only way to partition a workload, but it's a pretty good one.
You can implement this in your application; but if you do, then every application runs the same query, gets the full result set, and then ignores the majority of the results. While it's hard to avoid doing the work for the query on the database end, by having the database do the sharding calculation, you can meaningfully reduce the application's work.
In support of this pattern, we added fnv()
(64bit, 1a) and
siphash()
(2-4) functions to GlareDB, both of which return uint64
values.
While these two functions are enough to implement this sharding scheme,
we also added a partition_results
function to GlareDB to make
things even easier. The function returns a boolean, so you can use it
in a WHERE
clause, and the signature looks like:
partition_results(<value>, <num_partitions>, <partition_id>)
You can specify any scalar as the value value
or a column holding a
scalar value and it does the right thing, along with validating that
the number of partitions is greater than the partition ID. You can see
it at work in a query, as in:
SELECT u.account_id, u.access_token, messages.*
FROM users AS u
JOIN messages ON (messages.user_id = u.account_id)
WHERE (u.active = true
AND u.deleted = false
AND messages.sent = false
AND messages.destination IS NOT NULL
AND partition_results(u.account_id, 3, 1));
We hope you find it as useful as we do!
Get started now
Ready to get the most out of your data? Get started with GlareDB locally, or spin up a deployment on GlareDB Cloud!