Snowflake User Guide¶

Overview¶

Snowflake is a elastically scalable cloud data warehouse. By utlizing the power of Snowflake Spectus can provide an enhanced data exploration experience to our clients.

From 2024 Snowflake will be the main SQL engine utilized by Spectus Users. Working interactively through the JupyterLab Interface is still a main offering of the Platform. In addition, a new UI is availabe in the form of Snowsight from Snowflake.

Snowsight Interface¶

Snowflake Snowsight has been integrated into the set of tools on Spectus Builder in order to provide a seemless solution for multiple processes, including:

Easily referencing the on-platform Cuebiq data assets
Visualizing and Monitoring Compute usage
Scheduling Tasks/Procedures

In this User Guide you will find and overview of the main features of Snowsight relevant to Spectus users, 1. Projects, 2. Data, and 3. Admin. To learn more about the Snowsight interface see the Snowsight User Guide provided by Snowflake.

Projects¶

Worksheets¶

Worksheets provide a simple way for you to write SQL queries (DML and DDL), see the results, and interact with them. With a worksheet, you can do any of the following:

Run ad hoc queries and other DDL/DML operations.
Write Snowpark Python code in a Python worksheet.
Review the query history and results of queries that you executed.
Examine multiple worksheets, each with its own separate session.
Export the results for a selected statement, while results are still available.

For a full understanding of the Worksheet capabilities see the Snowflake Documentation or the "How Do I...?" section below for specfic workflows common to Spectus users such as:

Data¶

Databases¶

You can explore databases and objects, including tables, functions, views, and more using the database object explorer.

Through the Spectus platform client have access to 3 main Databases

CUEBIQ_DATA consists of Views of the data hosted directly by Cuebiq (change log). For a full introduction to the data available on the Platform please contact your Customer Sucess Manager or Book a Demo with our team.
DEDICATED is a local Database available for your team to create schemas/tables etc. as required for your project. For more details see Dedicated Workspace.
SNOWFLAKE is a Database that stores information about your usage of the platform to facilitate monitoring.

Admin¶

Cost Management¶

Snowflake provides visibilty into direct monitoring of your teams usage through it's Cost Management dashboard. This is avilable throught the Admin >> Cost Management menu item.

How do I...?¶

In this section you will find the workflow associated with some of the more common actions performed by Spectus users on the Snowflake platform

Connect to Snowflake SQL Engine from within JupyterLab
Preview the contents of database tables
Run a SQL query in Snowsight
Create a scheduled Task
Run Python Code in Snowsight
Create a Python Stored Procedure
Create a procedure with runtime varibles
Use the Data Event Trigger
Export a table via s3
Integrate with git
Create an output logger
Connect via the Snowflake API

Connect to Snowflake SQL Engine within JupyterLab¶

SQL queries through JuypterLab interface are now powered via the Snowflake SQL query engine.

To connect to Snowflake, simply copy and paste the contents below into the cell of a notebook and run the cell. You will need to run this cell within each notebook each time you start a sessions on the platform.

%load_ext cuebiqmagic.magics
%init_cuebiq_data 
snow_engine = get_ipython().user_ns['instance']

Once you run the command, you should receive a "Connected" message as below:

Preview the contents of database tables¶

New with the introduction of Snowflake is the ability to preview the contents of a table through the Graphical Interface.

From within the Data >> Databases screen

Select the Database of interest (e.g. CUEBIQ_DATA)
Select the Schema of interest (e.g. PAAS_CDA_V3)
Select the Table of interest (e.g. GEOGRAPHY_REGISTRY)
Select Data Preview
Select a Warehouse to run the Preview through (default COMPUTE_WH)
Click Preview

Run a SQL Query in Snowsight¶

Create a SQL Worksheet from Projects >> Worksheets

Within the SQL Worksheet you can run individual SQL queries by highlighting the query and running with with blue arrow.

Create a Scheduled Task¶

Within SQL Worksheet the user can create a TASK. A task can execute any one of the following types of SQL code:

Single SQL statement
Call to a stored procedure

Create a SQL Worksheet from Projects >> Worksheets

Use the command CREATE TASK to build the task either from a single SQL statement or an earlier prepared procedure (see how to create procedure below)
Highlight the command
Click the Blue Run Button

Newly created or cloned tasks are created suspended. For information on resuming suspended tasks see ALTER TASK … RESUME .

You can also manage your created TASK under the Data >> Databases. Select the … actions button to edit the task name or comment, clone the task, drop the task, or transfer ownership of the task to another role.

Running Python Code in Snowsight¶

Create a Python Worksheet from Projects >> Worksheets

Within the Python worksheet you can add code within the MAIN function. Additionaly you can import packages using the Anaconda package manager.

Creating a Python Stored Procedure¶

The Snowflake guide on creating a procedure from a python worksheet can be found here.

You can now manage your created PROCEDURE under the Data >> Databases.

This stored procedure can then be called using the SQL CALL command and when creating a TASK in a similar way to method discussed above for SQL code Create a scheduled Task.

For more information about using the CALL command with stored procedures see the Snowflake CALL documentation page.

Create a procedure with runtime varibles¶

Create a new Python Worksheet from Projects >> Worksheets screen and keep the session signature unchanged def main(session: snowpark.Session):

Write your code within the main function (as outlined above) and utilize local variables at this stage.
When ready to deploy click the DEPLOY button (beside the blue RUN arrow) and complete the required details. Select now "OPEN IN WORKSHEETS" to view the procedure creation code in a SQL Worksheet.

Example code:

create procedure test()
    returns Table()
    language python
    runtime_version = 3.8
    packages =('snowflake-snowpark-python')
    handler = 'main'
    as 'import snowflake.snowpark as snowpark

def main(session: snowpark.Session):
    param = 1
    df = session.table(''snowflake_sample_data.tpch_sf10.lineitem'').limit(param)
    return df';

To alter this to include runtime variables the following 3 steps are required:

removing local Python variables
changing main function signature
changing SQL stored procedure signature

After:

create procedure test(param INT)
    returns Table()
    language python
    runtime_version = 3.8
    packages =('snowflake-snowpark-python')
    handler = 'main'
    as 'import snowflake.snowpark as snowpark

def main(session: snowpark.Session, param:int):
    df = session.table(''snowflake_sample_data.tpch_sf10.lineitem'').limit(param)
    return df';

Use the Data Event Trigger¶

The data tables within the CUEBIQ_DATA database are updated with new partitions daily. A Data Event trigger is available for the following tables:

device_location
stop
trajectory

To trigger your code based on the creation of new paritions in these tables follow the below steps

Define the code in Python that you wish to run when triggered

Create the following procedure from within a SQL Worksheet:

CREATE OR REPLACE SCHEMA dedicated.my_schema;

CREATE OR REPLACE PROCEDURE dedicated.my_schema.process_daily_stop(processing_date INTEGER)
RETURNS VARCHAR
LANGUAGE PYTHON`
RUNTIME_VERSION=3.10
HANDLER='process'
PACKAGES = ('snowflake-snowpark-python')
EXECUTE AS OWNER
AS $$
import logging
logger = logging.getLogger("process_daily_stop")
def process(session, processing_date):
    logger.info(f"processing stops {processing_date}")
    #Enter your code here
    return "DONE"
$$;

CALL the dedicated.dedicated_event.create_event_trigger() procedure. This was created by the Cuebiq team to set up the STREAM that links the creation of the new partition with the procedure that runs your script.

CALL dedicated.dedicated_event.create_event_trigger(
 exec_on_warehouse => 'COMPUTE_WH',
 data_event_table => 'cuebiq_data.paas_cda_v3_event.stop',
 procedure_to_exec => 'dedicated.my_schema.process_daily_stop'
);

Verify the STREAM and TASK are created by running:

SHOW STREAMS IN SCHEMA dedicated.my_schema;
SHOW TASKS IN SCHEMA dedicated.my_schema;

Import/Export a table via s3¶

Within the DEDICATED Database available on Snowflake in the SPECTUS_S3_IMPORT_EXPORT there are two mounted stages that directly links to the export and import paths within the Cuebiq hosted S3 bucket you were provided when joining Spectus.

In order to unload your tables from other schemas in the DEDICATED Database you will use the COPY INTO command.

Below is example code to unload the table from Snowflake to a single uncompressed CSV file

COPY INTO @dedicated.spectus_s3_export.spectus_s3_export/SUBPATH/FILENAME.csv FROM dedicated.test_schema.test_table FILE_FORMAT = (TYPE = 'CSV'
  FIELD_DELIMITER = ','  COMPRESSION = NONE) SINGLE=TRUE max_file_size=4900000000 HEADER = TRUE;

Below is example code to load a single uncompressed file into a Snowflake table

COPY INTO dedicated.test_schema.test_table FROM @dedicated.spectus_s3_export.spectus_s3_export/SUBPATH/FILENAME.csv
FILE_FORMAT = (TYPE = 'CSV'
  FIELD_DELIMITER = ','  COMPRESSION = NONE) SINGLE=TRUE max_file_size=4900000000 HEADER = TRUE;

Best practices to query tables¶

It's a common best practice in the big data environment to rely on the partition columns to speed-up che query execution. If you plan to query the device_location table, please make sure to process one date at the time to avoid a long running query.

For all tables, if you need to run an analysis over multiple dates, please specify them within rounded parenthesis like this ('20241201', '20241202', etc..) instead of using the BETWEEN operator.
Always specify the country_code partition when available.
Always specify the data_provider partition when available.

Integrate with Git¶

To learn more about using Git within the Snowsight interface see the Snowflake documentation here.

In order to setup a git integration run the following SQL snippet in a Snowsight SQL Worksheet after replacing the following variables:

INTEGRATION_NAME
ACCOUNT_NAME
git_username
github_access_token
MY_GIT_REPO
REPO_NAME

SQL snippet:

USE ROLE USER;
USE DATABASE dedicated;
USE SCHEMA dedicated.git_integration;
CALL dedicated.git_integration.create_git_api_integration(
  integration_name => 'INTEGRATION_NAME',
  git_account_url => 'https://github.com/ACCOUNT_NAME'
);

CREATE OR REPLACE SECRET dedicated.git_integration.git_credentials
  TYPE = password
  USERNAME = 'git_username'
  PASSWORD = 'github_access_token';

CREATE OR REPLACE GIT REPOSITORY dedicated.git_integration.MY_GIT_REPO
  API_INTEGRATION = INTEGRATION_NAME
  GIT_CREDENTIALS = git_credentials
  ORIGIN = 'https://github.com/ACCOUNT_NAME/REPO_NAME.git';

SHOW GIT REPOSITORIES IN DATABASE dedicated;
ALTER GIT REPOSITORY dedicated.git_integration.my_git_repo FETCH;
SHOW GIT BRANCHES IN dedicated.git_integration.my_git_repo;
LS @dedicated.git_integration.MY_GIT_REPO/branches/master;

The setup contains a call to a stored procedure dedicated.git_integration.create_git_api_integration. This allows you to complete the setup autonomously with your standard permissions. You will find a schema within the dedicated catalog called git_integration that houses this stored procedure.

To create an SNS integration for notifications, use the CREATE_SNS_INTEGRATION stored procedure available in the DEDICATED.PUBLIC schema. The procedure accepts three parameters:

integration_name: The name you want to give to your SNS integration
aws_sns_role_arn: The ARN of the IAM role that Snowflake will assume
aws_sns_topic_arn: The ARN of the SNS topic that will receive the notifications

Example:

CALL create_sns_integration(
  integration_name => 'sns_test',
  aws_sns_role_arn => 'arn:aws:iam::ACCOUNT_ID:role/service-role/sns_test',
  aws_sns_topic_arn => 'arn:aws:sns:REGION:ACCOUNT_ID:test-topic'
  );

Then, to retrieve the external ID you need to call:

DESCRIBE NOTIFICATION INTEGRATION sns_test;

Create an output logger¶

The dedicated.dedicated_event.event_table table has been created to store logged events within your Snowflake instance.

To create a logger for your python outputs first import the logging module into your Worksheet. Additionally, you must create a logger to track your log entries. See below for example code to create your logger and add an entry:

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col
import logging

def main(session: snowpark.Session): 
    # Your code goes here, inside the "main" handler.

    logger = logging.getLogger("REPLACE_WITH_YOUR_LOGGER_NAME")

    logger.info(f"Logging from Python")

Once logged you can retrieve the logs from the event_table using SQL. See below for an example query to see your events logged in the dedicated.dedicated_event.event_table table:

SELECT
  TIMESTAMP AS time,
  RESOURCE_ATTRIBUTES['snow.executable.name'] as executable,
  RECORD['severity_text'] AS severity,
  VALUE AS message
FROM dedicated.dedicated_event.event_table
WHERE RECORD_TYPE = 'LOG' AND SCOPE['name'] = 'REPLACE_WITH_YOUR_LOGGER_NAME';

Connect via the Snowflake API¶

See the full documentation on the Spectus Snowflake API integration