🚀 GlareDB Pro now available:Simple, powerful, flexible serverless compute
producthow-to
July 10, 2024

Write PostgreSQL Data to Delta Tables with SQL

author image
Tal Gluck, Developer Advocate
Blog post cover image

If you're dealing with transaction data in PostgreSQL you may be looking for ways to move it out for improved analytics or archive it without losing access. The Delta table format is a great option for this kind of storage. In this post, we'll walk you through how to use GlareDB to query PostgreSQL data and write it to Delta tables, all with SQL, creating a seamless archiving process. You can follow along below, or check out the video walkthrough:

Why Use Delta Tables?

Delta tables are an open-source storage format that brings ACID transactions to big data workloads. They're particularly useful for data lakes and offer benefits like:

  1. Improved query performance
  2. Time travel (data versioning)
  3. Schema evolution
  4. ACID transactions

The Process: From PostgreSQL to Delta Tables

We'll break this down into three main operations:

  1. Creating a Delta table from PostgreSQL data
  2. Adding the Delta table as an external table in GlareDB
  3. Inserting new data into the Delta table

For this guide, we've taken this CSV of web event data, and loaded it into a Postgres instance. You can do that, or else select directly from the CSV file on GitHub, or download it and select from it locally, using read_csv.

Let's dive in!

Step 1: Creating a Delta Table from PostgreSQL Data

First, ensure you have GlareDB installed in your Python environment:

pip install glaredb

In a Python shell, import GlareDB and set up a GlareDB connection:

import glaredb
con = glaredb.connect()

Write a query to see the data in your Postgres table using read_postgres:

con.sql(
"""
    SELECT * FROM read_postgres(
        'postgresql://postgres:postgres@localhost:5433/postgres', 
        'public', 
        'web_events'
        )
    LIMIT 10
""").to_pandas()

Now, use the query above with COPY TO to select from the Postgres table write it to a Delta table. Ensure the directory for your Delta table exists and is empty before running this command.

con.sql("""
    COPY (
        SELECT * FROM read_postgres(
            'postgresql://postgres:postgres@localhost:5433/postgres',
            'public',
            'web_events'
        ) 
        ORDER BY 
            event_id ASC 
        LIMIT 1000
    ) 
    TO 'file:///Users/path/to/my_delta/' 
    FORMAT delta
""")

Step 2: Adding the Delta Table as an External Table in GlareDB

Once you have your Delta table, add it as an external table in GlareDB:

con.sql("""
    CREATE EXTERNAL TABLE IF NOT EXISTS delta_web_events
    FROM delta
    OPTIONS (LOCATION 'file:///Users/path/to/my_delta/')
""");

Now, query the Delta table as if it were any other table in our GlareDB instance:

con.sql("""
    SELECT * FROM delta_web_events
""").to_pandas()

Step 3: Inserting New Data into the Delta Table

As new events are recorded in PostgreSQL, you can insert them into your Delta table. First, set the access mode to read-write:

con.sql(
    """
        ALTER TABLE delta_web_events SET ACCESS_MODE TO READ_WRITE;
    """).to_pandas()

Then, insert the new data:

con.sql(
"""
    INSERT INTO 
        delta_web_events
    SELECT 
        pg.* 
    FROM 
        read_postgres(
            'postgresql://postgres:postgres@localhost:5433/postgres', 
            'public', 
            'web_events') pg
    LEFT JOIN 
        delta_web_events d
    ON 
        pg.event_id = d.event_id
    WHERE 
        d.event_id IS NULL
""")

This query inserts only the new records that aren't already in the Delta table.

Verifying Your Data

You can easily verify that your data has been successfully archived by querying your external table:

con.sql("""select * from delta_web_events ORDER BY event_id""").to_pandas()

Conclusion

Using GlareDB to archive PostgreSQL data to Delta tables offers a flexible and efficient solution for data management. This approach allows you to maintain easy access to your historical data while improving your analytics capabilities and reducing the load on your transactional database.

By following these steps, you can set up a robust archiving system that grows with your data needs. Whether you're dealing with web events, financial transactions, or any other type of data, this method provides a scalable approach to data archiving and analytics.

Remember, while we've focused on PostgreSQL in this example, GlareDB's versatility allows you to apply the same techniques to query data from various data sources, and archive it to various locations, including local storage, S3, and GCS.

Happy archiving!

Links:

Get started now

Ready to get the most out of your data? Get started with GlareDB locally, or spin up a deployment on GlareDB Cloud!

$curl https://glaredb.com/install.sh | sh
Try GlareDB Cloud