This post was written by Devin, an AI agent, with significant input from me. As I’ve explored agent use cases, I’ve found them most useful in two areas: streamlining repetitive tasks and distilling complex information into something more digestible. This post showcases the latter.
When building our new engine, one of my goals was to make scalar and aggregate SQL functions dead-simple to implement. I wanted someone new to the system to be able to jump right in and implement a function without worrying about validating input counts, handling type casts, or managing selection vectors. The implementation should focus only on the core logic.
A happy side effect of this simplicity is that we can lean on Devin to scaffold and implement most functions automatically. Because the bulk of our built-in functions follow clear patterns, Devin can look at existing examples, replicate them, and even generate the corresponding SQL logic tests--then run them to verify correctness. It’s pretty incredible what AI agents can do when guided by a solid framework.
Check out some of pull requests that Devin's opened in our main repo.
AI agents excel when working with existing, well-defined patterns. Novel or creative logic remains the domain of human engineers. GlareDB is, and will continue to be, written mostly by people. We can focus on the innovative and creative parts of the system, while delegating the tedious tasks to agents.
We aim for transparency around AI contributions. Any work done by an agent in our open source repo is going to attributed to that agent, and blog posts like this one are clearly labeled.
Sean, Founder
SQL functions are a powerful feature in any database, allowing users to transform and manipulate data in various ways. In GlareDB, we've designed our function architecture to make implementing new SQL functions as simple as possible while ensuring they're performant and type-safe.
In this post, we'll explore how GlareDB's function architecture automatically handles vectorization and type casting, allowing function implementations to focus purely on business logic.
GlareDB's SQL function system is built on a few key components:
This architecture allows us to separate the concerns of function implementation from the mechanics of execution and type handling. Let's dive deeper into each component.
In GlareDB, scalar functions are implemented as structs that implement the
ScalarFunction
trait. This trait requires two methods:
bind
- Determines the return type and prepares any state needed for executionexecute
- Implements the actual function logicHere's a simplified example of how the REVERSE
function is implemented:
pub struct Reverse;
impl ScalarFunction for Reverse {
type State = ();
fn bind(&self, inputs: Vec<Expression>) -> Result<BindState<Self::State>> {
Ok(BindState {
state: (),
return_type: DataType::Utf8,
inputs,
})
}
fn execute(_state: &Self::State, input: &Batch, output: &mut Array) -> Result<()> {
let sel = input.selection();
let input = &input.arrays()[0];
let mut s_buf = String::new();
UnaryExecutor::execute::<PhysicalUtf8, PhysicalUtf8, _>(
input,
sel,
OutBuffer::from_array(output)?,
|v, buf| {
s_buf.clear();
s_buf.extend(v.chars().rev());
buf.put(&s_buf)
},
)
}
}
Notice that the implementation focuses solely on the logic of the function - reversing the characters in a string. The complexities of handling null values, different array formats, and vectorized execution are all abstracted away.
One of the key features of GlareDB's function architecture is automatic vectorization. Rather than processing data one row at a time, GlareDB functions operate on entire arrays of data at once, which is much more efficient.
This vectorization is handled by executors such as UnaryExecutor
,
BinaryExecutor
, and TernaryExecutor
. These executors take care of:
For example, the REVERSE
function uses UnaryExecutor
directly in its implementation:
// From the execute method of Reverse
let mut s_buf = String::new();
UnaryExecutor::execute::<PhysicalUtf8, PhysicalUtf8, _>(
input,
sel,
OutBuffer::from_array(output)?,
|v, buf| {
s_buf.clear();
s_buf.extend(v.chars().rev());
buf.put(&s_buf)
},
)
The UnaryExecutor
handles all the complexity of iterating over the array,
checking for null values, and applying the provided function to each element.
This allows the function implementation to focus solely on the transformation
logic.
Another powerful feature of GlareDB's function architecture is automatic type casting. When a function is called with arguments of the wrong type, GlareDB will automatically cast those arguments to the expected type if possible.
This is handled by a sophisticated casting system with implicit and explicit casts. Each cast has a score, and GlareDB will choose the cast with the highest score when multiple options are available.
For example, if you call a function that expects an integer with a string argument, GlareDB will automatically try to parse the string as an integer. This makes functions more flexible and user-friendly.
After implementing a function, we need to create a FunctionSet
- a collection
that contains the function's name, documentation, and implementations. This
doesn't register the function yet; it just defines a collection of related
information.
Here's how the REVERSE
function's set is defined:
pub const FUNCTION_SET_REVERSE: ScalarFunctionSet = ScalarFunctionSet {
name: "reverse",
aliases: &[],
doc: &[&Documentation {
category: Category::String,
description: "Reverse the input string.",
arguments: &["string"],
example: Some(Example {
example: "reverse('hello')",
output: "olleh",
}),
}],
functions: &[RawScalarFunction::new(
&Signature::new(&[DataTypeId::Utf8], DataTypeId::Utf8),
&Reverse,
)],
};
The Signature
defines the expected input and output types for the function. In
this case, the function takes a UTF-8 string and returns a UTF-8 string. During
SQL planning, the planner uses this signature to ensure the function is called
with the correct number of arguments and automatically applies type casts as
needed.
For the function to be available in SQL queries, it needs to be registered in
the catalog. In GlareDB, built-in functions are registered by adding their
function sets to the BUILTIN_SCALAR_FUNCTION_SETS
collection, which is
automatically loaded into the catalog on system startup:
pub static BUILTIN_SCALAR_FUNCTION_SETS: &[&ScalarFunctionSet] = &[
// String functions
&FUNCTION_SET_REVERSE,
&FUNCTION_SET_LOWER,
// ... many more functions
];
Functions provided by extensions are handled in a similar way. Extension developers can define their own function sets and register them with the catalog when the extension is loaded, allowing them to seamlessly integrate with the rest of the system.
GlareDB's function architecture makes it easy to implement new SQL functions while ensuring they're performant and type-safe. By separating the concerns of function implementation from the mechanics of execution and type handling, we can focus on the business logic of each function while letting the framework handle the rest.
This architecture has allowed us to implement over 70 built-in scalar functions, ranging from mathematical operations to string manipulation to date and time functions. And with the power of automatic vectorization and type casting, these functions are both flexible and performant.
If you're interested in learning more about GlareDB's internals or contributing to the project, check out our GitHub and documentation.