SQL Functions in GlareDB: Automatic Vectorization and Type Casting

Devin AI EngineerApril 22nd, 2025

This post was written by Devin, an AI agent, with significant input from me. As I’ve explored agent use cases, I’ve found them most useful in two areas: streamlining repetitive tasks and distilling complex information into something more digestible. This post showcases the latter.

When building our new engine, one of my goals was to make scalar and aggregate SQL functions dead-simple to implement. I wanted someone new to the system to be able to jump right in and implement a function without worrying about validating input counts, handling type casts, or managing selection vectors. The implementation should focus only on the core logic.

A happy side effect of this simplicity is that we can lean on Devin to scaffold and implement most functions automatically. Because the bulk of our built-in functions follow clear patterns, Devin can look at existing examples, replicate them, and even generate the corresponding SQL logic tests--then run them to verify correctness. It’s pretty incredible what AI agents can do when guided by a solid framework.

Check out some of pull requests that Devin's opened in our main repo.

AI agents excel when working with existing, well-defined patterns. Novel or creative logic remains the domain of human engineers. GlareDB is, and will continue to be, written mostly by people. We can focus on the innovative and creative parts of the system, while delegating the tedious tasks to agents.

We aim for transparency around AI contributions. Any work done by an agent in our open source repo is going to attributed to that agent, and blog posts like this one are clearly labeled.

Sean, Founder

Introduction

SQL functions are a powerful feature in any database, allowing users to transform and manipulate data in various ways. In GlareDB, we've designed our function architecture to make implementing new SQL functions as simple as possible while ensuring they're performant and type-safe.

In this post, we'll explore how GlareDB's function architecture automatically handles vectorization and type casting, allowing function implementations to focus purely on business logic.

Function Architecture Overview

GlareDB's SQL function system is built on a few key components:

Function Sets - Collections of function implementations with documentation
Function Traits - Interfaces for implementing different types of functions
Executors - Components that handle vectorized execution
Type Casting - Automatic conversion between data types

This architecture allows us to separate the concerns of function implementation from the mechanics of execution and type handling. Let's dive deeper into each component.

Function Implementation

In GlareDB, scalar functions are implemented as structs that implement the ScalarFunction trait. This trait requires two methods:

bind - Determines the return type and prepares any state needed for execution
execute - Implements the actual function logic

Here's a simplified example of how the REVERSE function is implemented:

pub struct Reverse;

impl ScalarFunction for Reverse {
    type State = ();

    fn bind(&self, inputs: Vec<Expression>) -> Result<BindState<Self::State>> {
        Ok(BindState {
            state: (),
            return_type: DataType::Utf8,
            inputs,
        })
    }

    fn execute(_state: &Self::State, input: &Batch, output: &mut Array) -> Result<()> {
        let sel = input.selection();
        let input = &input.arrays()[0];
        
        let mut s_buf = String::new();
        
        UnaryExecutor::execute::<PhysicalUtf8, PhysicalUtf8, _>(
            input,
            sel,
            OutBuffer::from_array(output)?,
            |v, buf| {
                s_buf.clear();
                s_buf.extend(v.chars().rev());
                buf.put(&s_buf)
            },
        )
    }
}

Notice that the implementation focuses solely on the logic of the function - reversing the characters in a string. The complexities of handling null values, different array formats, and vectorized execution are all abstracted away.

Automatic Vectorization

One of the key features of GlareDB's function architecture is automatic vectorization. Rather than processing data one row at a time, GlareDB functions operate on entire arrays of data at once, which is much more efficient.

This vectorization is handled by executors such as UnaryExecutor, BinaryExecutor, and TernaryExecutor. These executors take care of:

Handling null values automatically
Supporting different array formats (flat, selection)
Applying the function to each element in an array

For example, the REVERSE function uses UnaryExecutor directly in its implementation:

// From the execute method of Reverse
let mut s_buf = String::new();

UnaryExecutor::execute::<PhysicalUtf8, PhysicalUtf8, _>(
    input,
    sel,
    OutBuffer::from_array(output)?,
    |v, buf| {
        s_buf.clear();
        s_buf.extend(v.chars().rev());
        buf.put(&s_buf)
    },
)

The UnaryExecutor handles all the complexity of iterating over the array, checking for null values, and applying the provided function to each element. This allows the function implementation to focus solely on the transformation logic.

Automatic Type Casting

Another powerful feature of GlareDB's function architecture is automatic type casting. When a function is called with arguments of the wrong type, GlareDB will automatically cast those arguments to the expected type if possible.

This is handled by a sophisticated casting system with implicit and explicit casts. Each cast has a score, and GlareDB will choose the cast with the highest score when multiple options are available.

For example, if you call a function that expects an integer with a string argument, GlareDB will automatically try to parse the string as an integer. This makes functions more flexible and user-friendly.

Function Sets and Registration

After implementing a function, we need to create a FunctionSet - a collection that contains the function's name, documentation, and implementations. This doesn't register the function yet; it just defines a collection of related information.

Here's how the REVERSE function's set is defined:

pub const FUNCTION_SET_REVERSE: ScalarFunctionSet = ScalarFunctionSet {
    name: "reverse",
    aliases: &[],
    doc: &[&Documentation {
        category: Category::String,
        description: "Reverse the input string.",
        arguments: &["string"],
        example: Some(Example {
            example: "reverse('hello')",
            output: "olleh",
        }),
    }],
    functions: &[RawScalarFunction::new(
        &Signature::new(&[DataTypeId::Utf8], DataTypeId::Utf8),
        &Reverse,
    )],
};

The Signature defines the expected input and output types for the function. In this case, the function takes a UTF-8 string and returns a UTF-8 string. During SQL planning, the planner uses this signature to ensure the function is called with the correct number of arguments and automatically applies type casts as needed.

For the function to be available in SQL queries, it needs to be registered in the catalog. In GlareDB, built-in functions are registered by adding their function sets to the BUILTIN_SCALAR_FUNCTION_SETS collection, which is automatically loaded into the catalog on system startup:

pub static BUILTIN_SCALAR_FUNCTION_SETS: &[&ScalarFunctionSet] = &[
    // String functions
    &FUNCTION_SET_REVERSE,
    &FUNCTION_SET_LOWER,
    // ... many more functions
];

Functions provided by extensions are handled in a similar way. Extension developers can define their own function sets and register them with the catalog when the extension is loaded, allowing them to seamlessly integrate with the rest of the system.

Conclusion

GlareDB's function architecture makes it easy to implement new SQL functions while ensuring they're performant and type-safe. By separating the concerns of function implementation from the mechanics of execution and type handling, we can focus on the business logic of each function while letting the framework handle the rest.

This architecture has allowed us to implement over 70 built-in scalar functions, ranging from mathematical operations to string manipulation to date and time functions. And with the power of automatic vectorization and type casting, these functions are both flexible and performant.

If you're interested in learning more about GlareDB's internals or contributing to the project, check out our GitHub and documentation.