If you work with data regularly, you've probably run into this situation: Your data lives in different places, but you need to analyze it all together to answer important questions. Let's talk about how query federation can help solve this common headache.
Prefer a video? Watch it here, or continue reading below:
Here are some typical scenarios in modern data architectures:
Picture these scenarios:
I'm kidding about the Excel file... but only kind of. We've all been there!
The traditional approach to data integration involves setting up ETL pipelines to consolidate data in one location. However, this method introduces several challenges:
This traditional approach is kind of like going into your kitchen to cook a stir-fry and discovering you first need to:
Query federation enables real-time access, analysis, and joining of data from multiple sources within a single SQL query. Instead of executing complex ETL processes, query federation treats distributed data sources as a unified database.
Using our cooking analogy: Rather than establishing complex infrastructure to move ingredients, query federation provides instant access to all ingredients, wherever they're stored.
Consider this real-world business analytics scenario: analyzing top products by sales volume and customer satisfaction scores.
The data sources include:
Using query federation, for instance with GlareDB, a single SQL query can join these diverse data sources seamlessly. When new data arrives in any format, including Excel files, it integrates naturally into your existing queries. In this example, you could run a complex, cross-data source query to compare sales transactions in Postgres, product information in Snowflake, and customer survey data in Excel files on S3, without any need for data migration.
SELECT
p.product_id,
p.product_name,
p.category,
SUM(t.total_amount) as total_sales,
AVG(s.satisfaction_score) as avg_satisfaction,
COUNT(DISTINCT s.survey_id) as num_surveys
FROM
read_postgres(
'postgresql://user:password@location:5433/postgres',
'public',
'transactions'
) t
INNER JOIN
read_snowflake(
'snowflake_account.us-central1.gcp',
'snowflake_username',
'snowflake_password',
'sandbox',
'compute_wh',
'accountadmin',
'public',
'products'
) p
ON t.product_id = p.product_id
LEFT JOIN
read_excel(
's3://my-bucket/latest_survey_data.xlsx',
access_key_id => 'aws_access_key_id',
secret_access_key => 'aws_secret_access_key',
region=>'us-east-2'
) s ON t.product_id = s.product_id
GROUP BY
p.product_id, p.product_name, p.category
HAVING num_surveys > 3
ORDER BY
total_sales DESC
LIMIT 20;
Query federation eliminates complex data infrastructure requirements for basic data analysis. The technology supports multiple workflows:
The only requirement is basic Python knowledge or terminal access - no complex infrastructure setup needed.
The true value of query federation extends beyond workflow simplification. It fundamentally changes how teams interact with distributed data, shifting focus from data movement to data analysis and insight generation.
Check out our documentation to begin implementing query federation in your data operations. Join our community for support and discussion of best practices in distributed data access. Subscribe on YouTube, LinkedIn, or X to learn more about best practices for data storage, fundamentals of database internals, and product updates for GlareDB. And sign up for GlareDB Cloud to make use of hybrid execution and effortlessly collaborate with your team on your data no matter where they're stored!