Trino --> Snowflake¶
Overview¶
In 2024, we transitionsed from using a Trino SQL engine to the Snowflake integrated SQL engine
Here's a list of changes between the two Engines and the actions you need to take into account to be up to speed.
Snowflake connection code¶
Connecting to Snowflake requires alternative connection code versus connecting to the Trino cluster. You'll replace the following:
# SQL engine
from trino.dbapi import connect
from sqlalchemy import create_engine
import pandas as pd
class TrinoEngine():
def __init__(self):
conn = connect(
host="localhost",
port=9090,
catalog="cuebiq"
)
self.cur = conn.cursor()
self.engine = create_engine("trino://localhost:9090/cuebiq/")
def execute_statement(self, query:str) -> list:
"""
Create and drop statements.
"""
self.cur.execute(query)
return self.cur.fetchall()
def read_sql(self, query:str) -> pd.DataFrame:
"""
Select and insert into operations.
"""
return pd.read_sql(query, self.engine)
sql_engine = TrinoEngine()
with:
%load_ext cuebiqmagic.magics
%init_cuebiq_data
snow_engine = get_ipython().user_ns['instance']
Provider ID¶
During the migration to Snowflake the provider_id format was update to replace the six digit provider identifier with a user friendly and more accessible phonetic alphabet identifier. To assist with migration a translation table,CUEBIQ_DATA.DATA_PROVIDER.PROVIDER_DETAIL
, was created within the CUEBIQ_DATA
database.
Additionally, a dedicated tutorial notebook is provided within the JupyterLab instance to illustrate the best practices when querying this table. You can find this notebook within the data-clean-room-help-snowflake/getting-started
folder:
Spectus Tutorials¶
Within the Jupyter interface the Spectus Tutorials have been cloned and updated to include the above code. The Snowflake specfic versions of the tutorials are availabe in the data-clean-room-help-snowflake
folder.
New tutorials will only be available with Snowflake Connector.
Dataflow Studio impacts¶
To perform the tasks previously undertaken by Dataflow Studio you will now utilize the integrating functionality of Snowflake Snowsight (see Snowflake User Guide)
Here is an overview of the tool migrations:
Exports¶
Exports will continue to be processed via the same Cuebiq hosted s3 bucket linked to your existing tenant, see Snowflake User Guide on how to perform this action.
Running jobs/tasks via the API¶
In order run and monitor tasks via the API the Snowflake API has been integrated to permit EXECUTE TASK
and CALL
commands. For the full details see the dedicated documentation, Specuts Snowflake API
SQL Command changes¶
For a comprehensive list of SQL Commands available through Snowflake see the Snowflake SQL Command Reference page
Below is a list of the common SQL Commands updated within the Spectus tutorials, this will assist clients as they migrate their code to utilize Snowflake through Spectus
Trino SQL | Snowflake SQL |
---|---|
ST_GEOMETRYFROMTEXT | ST_GEOGRAPHYFROMWKT |
ARRAY_JOIN | ARRAY_TO_STRING |
SLICE | ARRAY_SLICE |
FROM_iso8601_TIMESTAMP | TO_TIMESTAMP |
DATE_ADD | DATEADD |
GREAT_CIRCLE_DISTANCE | HAVERSINE |
... | ... |