Platform User Guide¶

Overview¶

Using the sidebar, you can easily access the platform's robust functionalities, divided into 4 main categories.
- Build
  - Jupyter Notebook: Primary data interaction and workflow programming tool (see JupyterLab for more details)
  - Snowflake: Launch Snowsight from Snowflake (see Snowflake User Guide for more details)
- Visualize
  - My Apps: Catalog of demo applications, along with cleanroom tutorials
Tiles on the home screen provide quick, easy access to the primary platform tools.
A link in the top right-most corner provides convenient access to our documentation from within the platform.
A link in the top right-most corner provides convenient access to our support portal from within the platform.
User Account menu.

JupyterLab¶

Notebooks¶

Create and share documents that contain live code, visualizations, and documentation with notebooks.

We offer JupyterLab as the notebook interface because it is the most widely used, open-source notebook environment. To learn more about how to use JupyterLab, check out the documentation here.

You can create a new notebook from the Launcher in JupyterLab.

Note: When creating new notebooks or folders within Jupyter, please do not include any spaces as this will result in errors when running notebooks through Dataflow Studio jobs. Instead, we recommend using hyphens or underscores to separate words.

Default Configurations¶

Each user has their own dedicated JupyterLab instance with the following resources:

2 CPUs
8 GB RAM
50 GB Disk

Each session has a duration of 10 hours. After 10 hours, your session will expire resulting in errors if you attempt to run queries. To start a new session, you must restart the server by following these steps:

Go to "File"
Click on "Hub Control Panel"
Click on "Stop My Server"
Click on "Start My Server"
Click on "Launch Server"

Query Engine¶

Run queries and code developed from notebooks using our high-performance big data query engine.

A Snowflake integration powers our main SQL Engine on the platform. Using this integration your queries will execute via the Snowflake query engine. To learn more about our integration of Snowflake see the Snowflake User Guide.

Connect to Snowflake SQL Engine¶

SQL queries through JuypterLab interface are powered via the Snowflake SQL query engine.

To connect to Snowflake, simply copy and paste the contents below into the cell of a notebook and run the cell. You will need to run this cell within each notebook each time you start a sessions on the platform.

%load_ext cuebiqmagic.magics
%init_cuebiq_data 
snow_engine = get_ipython().user_ns['instance']

Once you run the command, you should receive a "Connected" message as below:

Tips and Tricks¶

To comment in a cell, you must enclose the comment in /* and */ as shown below.

Notebook Tutorials¶

Our Data Science team is continually developing and releasing notebook tutorials for Spectus users. The goal of these tutorials is to pass on Spectus' institutional knowledge to users, allowing you to accelerate the time it takes you to gain insight and drive value with the Spectus Data Clean Room. The notebook tutorials are available to all Spectus users within their Jupyter Notebook interface under the data-clean-room-help folder, and also via the App Gallery in Documentation >> Tutorials.

For easier discovery and navigation, we recently introduced an Index of all tutorials under the data-clean-room-help folder.

Dedicated Workspace¶

Create and manage tables and save your work within your Dedicated Workspace. Each organization is provided with its own Dedicated Workspace that remains private to that organization. All users with access to the Platform can access the organization’s Dedicated Workspace. To avoid hitting the 50GB limit on the local disk of your Jupyter instance, we highly recommend using your dedicated workspace to store your data.

We leverage Snowflake as the data warehouses software to facilitate reading, writing, and managing large datasets within the Dedicated Workspace. Please speak with your Customer Sucess Manager to determine which data warehouse to utilize.

Within this Dedicated Workspace, Platform users have the ability to create and delete new schemas inside of their dedicated catalog that can be used to help different users organize their work across the organization. Users can also create and drop tables in the dedicated catalog as well as insert data into any tables they create.

Any user within an organization can read and insert data into any table within the organization’s dedicated catalog. For more guidance on how to create and manage tables within the Dedicated Workspace please check out the notebook tutorial on this located in the following folder: data-clean-room-help/getting-started/01_manage_tables.ipynb.

Support for git¶

Collaborate on code with users in your organization using native support for git clients like Github within the Spectus Data Clean Room.

JupyterLab offers a git plugin that makes it easy to interact with your repositories. Based on your git server configuration, you can access your repositories using HTTPS or through an SSH connection. Instructions on how to do so can be found here.