Introducing GlareDB: Query and analyze your data, wherever it lives
Today, a company’s data lives everywhere. In databases, warehouses, s3 buckets, across martech tools, and everywhere in between. Extract Transform Load (ETL) pipelines are needed to move data or export it into a data warehouse. Production databases are not optimized for analytics and often represent only a segment of a company’s data. Warehouses are great at running analytics, and tools like dbt structure the data for analysis. But, now there’s two or more copies of data and added latency to keep data up to date. Accessing your data requires multiple tools, processes, and humans to get to the point of actually being able to ask questions, extract insights, and drive decisions with your data.
GlareDB simplifies this process. GlareDB is a cloud-based query execution engine that hooks directly into your data sources (databases, data warehouses, object storage) and makes it fast (like, really fast) to use a SQL-based interface to query across data sources. We remove the need for ETL pipelines entirely for your analytics needs while also reducing latency as the data in your connected data sources changes.
How it works
GlareDB hooks into databases and data warehouse directly. It pulls data directly from these data sources and transforms it into a more efficient format for analytics, and then executes on this format.
GlareDB supports multiple data sources, allowing you to run analytics across them efficiently. It’s Postgres-compatible, and you can use your existing BI/visualization tooling.
As of initial launch, GlareDB integrates with the following data sources:
- Parquet and CSV files on S3 and GCS
Querying data sources
Let’s take a quick look at what querying data sources looks like. We’ve been hard at work making adding data sources a very smooth experience. For example, to add BigQuery as a data source, we just need to run a single SQL command:
CREATE EXTERNAL DATABASE bq FROM bigquery OPTIONS ( project_id = SECRET gcp_project service_account_key = SECRET gcp_sa );
(Don’t worry, adding data sources can also be done through the GlareDB dashboard.)
And that’s it. A single command added our BigQuery instance as a data source to GlareDB. We can now query the data source directly:
SELECT * FROM bq.glaredb_prod.execution_metrics;
But what if the data source I’m querying doesn’t have all the data I need? A powerful feature of GlareDB is a single deployment can connect to multiple data sources, with queries being able to access data from multiple different sources. Let’s dive into an example with joining data across our BigQuery and Postgres data sources.
As above, we just need to run a single SQL command to add in our Postgres data source:
CREATE EXTERNAL DATABASE prod FROM postgres OPTIONS ( connection_str = SECRET pg_prod_connection_str );
And now our query:
SELECT u.email, AVG(m.elapsed_compute_ns) FROM bq.glaredb_prod.execution_metrics m INNER JOIN prod.public.users u ON m.user_id = u.id;
We’re able to reference our BigQuery and Postgres data sources in a single query, all without needing ETL to move data around. This can easily be extended to handle more complex workloads with more data sources as well.
Check out our docs to see more ways to work with your data in GlareDB.
Why use it
Here’s where we think GlareDB really shines for simplifying workflow for data and analytics engineers:
- Directly query data stored in S3
- Transform MongoDB collections into structured tables
- Connect BI tools to generate reports across multiple data sources
- Connect dbt to GlareDB to create models that span multiples sources
And it’s really fast. Our entire mission is making analytics easier and faster. We built GlareDB from the ground up to efficiently execute analytical queries, even for data that resides in systems like Postgres where performance suffers with these types of workloads. Behind the scenes, GlareDB goes as low-level as possible to fetch data from the source, and transforms it into a format more suitable for analytics. Analytical queries that are slow on Postgres or MySQL run efficiently through GlareDB. Want instant analytics on your Postgres data without moving it into Snowflake or another warehouse? Come try GlareDB!
Today we’re expanding access to the GlareDB Technical Preview. If you’re excited about running analytics on multiple sources of data, or you’re tired of having to manage ETL pipelines, we would love to hear from you. Sign up for free , or reach out directly at firstname.lastname@example.org.