Skip to content

Trino --> Snowflake

Overview

In 2024, we transitionsed from using a Trino SQL engine to the Snowflake integrated SQL engine

Here's a list of changes between the two Engines and the actions you need to take into account to be up to speed.

Snowflake connection code

Connecting to Snowflake requires alternative connection code versus connecting to the Trino cluster. You'll replace the following:

# SQL engine
from trino.dbapi import connect 
from sqlalchemy import create_engine
import pandas as pd

class TrinoEngine():
    def __init__(self):
        conn = connect(
            host="localhost",
            port=9090,
            catalog="cuebiq"
        )
        self.cur = conn.cursor()
        self.engine = create_engine("trino://localhost:9090/cuebiq/")

    def execute_statement(self, query:str) -> list:
        """
        Create and drop statements.
        """
        self.cur.execute(query)
        return self.cur.fetchall()

    def read_sql(self, query:str) -> pd.DataFrame: 
        """
        Select and insert into operations.
        """
        return pd.read_sql(query, self.engine)

sql_engine = TrinoEngine()

with:

%load_ext cuebiqmagic.magics
%init_cuebiq_data 
snow_engine = get_ipython().user_ns['instance']

Provider ID

During the migration to Snowflake the provider_id format was update to replace the six digit provider identifier with a user friendly and more accessible phonetic alphabet identifier. To assist with migration a translation table,CUEBIQ_DATA.DATA_PROVIDER.PROVIDER_DETAIL, was created within the CUEBIQ_DATA database.

Additionally, a dedicated tutorial notebook is provided within the JupyterLab instance to illustrate the best practices when querying this table. You can find this notebook within the data-clean-room-help-snowflake/getting-started folder:

Spectus Tutorials

Within the Jupyter interface the Spectus Tutorials have been cloned and updated to include the above code. The Snowflake specfic versions of the tutorials are availabe in the data-clean-room-help-snowflake folder.

New tutorials will only be available with Snowflake Connector.

Dataflow Studio impacts

To perform the tasks previously undertaken by Dataflow Studio you will now utilize the integrating functionality of Snowflake Snowsight (see Snowflake User Guide)

Here is an overview of the tool migrations:

Exports

Exports will continue to be processed via the same Cuebiq hosted s3 bucket linked to your existing tenant, see Snowflake User Guide on how to perform this action.

Running jobs/tasks via the API

In order run and monitor tasks via the API the Snowflake API has been integrated to permit EXECUTE TASK and CALL commands. For the full details see the dedicated documentation, Specuts Snowflake API

SQL Command changes

For a comprehensive list of SQL Commands available through Snowflake see the Snowflake SQL Command Reference page

Below is a list of the common SQL Commands updated within the Spectus tutorials, this will assist clients as they migrate their code to utilize Snowflake through Spectus

Trino SQL Snowflake SQL
ST_GEOMETRYFROMTEXT ST_GEOGRAPHYFROMWKT
ARRAY_JOIN ARRAY_TO_STRING
SLICE ARRAY_SLICE
FROM_iso8601_TIMESTAMP TO_TIMESTAMP
DATE_ADD DATEADD
GREAT_CIRCLE_DISTANCE HAVERSINE
... ...