🚀 GlareDB Pro now available:Simple, powerful, flexible serverless compute
producthow-to
May 21, 2024

Join Snowflake with Postgres Data Using Data Federation

author image
Tal Gluck, Developer Advocate
Blog post cover image

Given all of the options for data infrastructure today, it's common to have your data spread across multiple data sources. What do you do if you want to answer questions that rely on more than one data source? You can set up a bunch of infrastructure to move all of your data to the same place, but that's not always practical or convenient. Today, there is a simpler option: data federation, also known as query federation. Data federation lets you treat all of your different data as though they were all in one place, and thus lets you write queries to join data that's in different locations, and in different formats. In this example, we'll show you how to can quickly get started writing queries to join data between Snowflake and Postgres. You can follow along below, or check out the video walkthrough:

The Scenario

Let's say we have a table of New York City real estate sales data stored in Snowflake. This table contains various fields with information about property locations and transactions. One of the fields is the borough ID where each property is located.

However, the lookup table that maps borough IDs to borough names is stored in PostgreSQL. How can we efficiently combine this information? GlareDB makes it simple.

Setting Up Your Environment

For this demonstration, we'll use a Jupyter Notebook, but you can also use the GlareDB Cloud workspace, CLI, or Node.js bindings. Here's how to get started:

  1. Install the GlareDB Python bindings:

    pip install glaredb
    
  2. Import GlareDB in your Python environment:

    import glaredb
    
  3. Set up a connection to GlareDB.

    con = glaredb.connect()
    

Querying Data from Different Sources

GlareDB provides specific functions for reading from various databases. In our case, we'll use:

Here's an example of how to query the PostgreSQL lookup table:

postgres_table = con.sql(
    f"""
    SELECT * FROM
        read_postgres(
        'host=pg.demo.glaredb.com port=5432 user={postgres_user} password={postgres_password} dbname=postgres',  
        'public',
        'borough_lookup'
    ) 
    """
).to_pandas()

postgres_table

You run your query with con.sql(), then output the results of your query to a Pandas DataFrame containing the borough names and IDs using .to_pandas()`.

Joining Data Across Databases

Now for the exciting part – joining data from Snowflake and Postgres. You can treat the results of your calls to the read_... functions as tables within the same database.

joined_data = con.sql(
    f"""
SELECT postgres.borough_name, snowflake.* FROM 
    read_snowflake(
    '{snowflake_account}.us-central1.gcp', 
    '{snowflake_user}',
    '{snowflake_password}',
    'sandbox',
    'compute_wh',
    'accountadmin',
    'public',
    'nyc_sales'
    ) snowflake 
LEFT JOIN 
    read_postgres(
        'host=pg.demo.glaredb.com port=5432 user={postgres_user} password={postgres_password} dbname=postgres',  
        'public',
        'borough_lookup'
    ) postgres ON
    snowflake.borough = postgres.borough_id
    limit 100  
    """).to_pandas()

joined_data

This query joins the Snowflake real estate data with the PostgreSQL borough lookup table based on the borough ID. The result is a pandas DataFrame that includes all fields from the Snowflake table plus the borough name from PostgreSQL.

If you find that you're often querying the same data sources, you can add your Snowflake or PostgreSQL instances to GlareDB as external data sources, making it easier to query multiple tables without repeatedly entering credentials.

Conclusion

By using data federation and GlareDB, joining data from different data sources becomes a straightforward process. In just a few lines of code, we've combined data from Snowflake and PostgreSQL, eliminating the need for complex ETL processes or data migrations.

Happy data joining!

Links:

Get started now

Ready to get the most out of your data? Get started with GlareDB locally, or spin up a deployment on GlareDB Cloud!

$curl https://glaredb.com/install.sh | sh
Try GlareDB Cloud