Spectus DataCatalog and Spectus DataCleanRoom Help Tutorials Changelog¶
This document will be a changelog to capture updates that we make both to the Spectus Data Catalog and the Spectus Data Clean Room Help Tutotials.
We strive to make changes in periodic batches and provide visibility to users about those changes.
May 10, 2023¶
DATA CATALOG
Added¶
- Added the table
stop
into the schemavehicle_v1
.
WORKBENCH HELP TUTORIALS
Added¶
- The new tutorial Stops dedicated to the newly released asset has been added. With this tutorial the vehicle stop table is presented and some use cases enabled by this asset are reported.
This involves the ExploreTheCatalog/VehicleDataAssets section.
- The new tutorial Monthly Average Daily Traffic, describing some methods to compute one among the most useful metrics when performing traffic analysis over a roadway of interest: the Monthly Average Daily Traffic (MADT)
This involves the UseCases section.
Changed¶
- The tutorial Traffic Analysis has been updated to include the computation of the Monthly Average Daily Traffic with one of the methods presented in the ad hoc tutorial, over the specific road under analysis.
This involves the UseCases section.
March 20, 2023¶
WORKBENCH HELP TUTORIALS
Added¶
- The new tutorial Traffic Analysis dedicated to some of the most common traffic analyses has been added. With this tutorial the value of the vehicle assets is opened up and tangible support to actually implement well-known traffic use cases is provided.
This involves the UseCases section.
January 25, 2023¶
DATA CATALOG
Changed¶
The table vehicle_v1.trajectory
has been modified as following;
Added the new fields:
- trajectory_wkt
- start_block_group_id and end_block_group_id
- max_time_gap_seconds
- trajectory_wkt
- bounding_box_diagonal_meters
Removed noisy and extremely short trajectories.
WORKBENCH HELP TUTORIALS
Changed¶
- The tutorial dedicated to the vehicle
trajectory
asset has been modified introducing the new fields and possible use cases enabled by them.
This involves the ExploreTheCatalog/VehicleDataAssets section.
December 22, 2022¶
DATA CATALOG
New¶
You've got the chance to get used to the new vehicle_location
asset playing around with the sample paas_samples_v2.vehicle_location
: now, the 2 final assets vehicle_location
and trajectory
have been added to the newly created vehicle_v1
schema.
Together with the first brick of any further aggregated dataset built with connected vehicles mobility data, here also the table made to work with the journeys each vehicle is making every day.
Added¶
- Added the new schema
vehicle_v1
storing the following tables:vehicle_location
trajectory
WORKBENCH HELP TUTORIALS
Added¶
- The tutorials dedicated to the new assets
vehicle_location
and vehicletrajectory
have been added.
This involves the ExploreTheCatalog/VehicleDataAssets section.
December 20, 2022¶
DATA CATALOG
Added¶
- Added a new table in the sample schema:
paas_samples_v2.probabilistic_stoppers_hll_by_geohash
WORKBENCH HELP TUTORIALS
Changed¶
- The tutorial of the Densities have been updated to describe these new asset, having the same structure of the
paas_cda_v3.stoppers_hll_by_geohash
, and providing quantitative test results of the comparison between the probabilistic and the deterministic densities.
December 14, 2022¶
DATA CATALOG
New¶
A new kind of data is entering the platform. After an important collection of assets built with mobile device mobility data, we now present the first brick of a new set of core data assets: a sample of the incoming vehicle_location
asset, namely the 'equivalent' of device_location
but starting from mobility data of connected vehicles.
Added¶
- Added a new table in the sample schema:
paas_samples_v2.vehicle_location
WORKBENCH HELP TUTORIALS
Added¶
- The tutorial dedicated to this new asset
vehicle_location
has been added and it is including a brief presentation of the sample, the crucial aspects of the new asset and a set of initial interactions for a deeper knowledge. This involves the ExploreTheCatalog/CoreDataAssets section.
October 19, 2022¶
WORKBENCH HELP TUTORIALS
Changed¶
- Four tutorials in the Use Cases section have been updated to enable users to extract mobility insights easily and autonomously.
In details:
- Mobility index (previously in ExploreTheCatalog/CoreDataAsset section)
- Contact index
- Visit index (previously in ExploreTheCatalog/CoreDataAsset section)
- Relocation analysis
Added¶
- One tutorial has been added to explain how to compute flows between polygons of interest:
- Traceability analysis
September 13, 2022¶
DATA CATALOG
Added¶
- Added a new table named
paas_cda_v3.trajectory_uplevelled
WORKBENCH HELP TUTORIALS
Changed¶
- The tutorials dedicated to the
trajectory
core data asset has been modified to include a section describing the new asset. This involves the ExploreTheCatalog/CoreDataAssets section.
Here the main points of attention but please do refer to the tutorial for details:
- The uplevelled version of the trajectory table has two additional columns classifying the trajectory in terms of privacy: start_classification_type and end_classification_type.
- If the starting or the ending point concern recurrig areas, the full trajectory is privatized.
- The fields impacted by the privatization are start/end_lat, start/end_lng, start/end_geohash, length_meters, trajectory_wkt.
July 20, 2022¶
DATA CATALOG
New¶
In this update we are releasing a new version of the Core Data Assets schema (paas_cda_v3
) made to hold the main Core Data Assets tables with an additional layer of partition: the provider of the data.
Added¶
- Added the new schema
paas_cda_v3
storing the following tables:device_location
anddevice_location_uplevelled
stop
andstop_uplevelled
visit
device_recurring_area
device_metrics
anddevice_metrics_uplevelled
geography_registry
andcbsa
poi
andpoi_history
custom_poi
andcustom_poi_history
brand
device_user_labeling
andsegment_taxonomy
trajectory
stoppers_hll_by_geohash
.
From now on the EU data will be included in this schema, therefore the schema
paas_cda_eu_v3
won’t be part of this release, nor the following ones.
Changed¶
We report here the main changes to existing Core Data Assets tables brought by this paas_cda_v3
schema release.
All the core data assets inherited from mobility data are now characterized by the additional dimension of the data provider, made available via the provider_id column. Here is listed the full set of involved CDAs:
device_location
anddevice_location_uplevelled
stop
andstop_uplevelled
visit
device_recurring_area
trajectory
stoppers_hll_by_geohash
The value PERSONAL_AREA of the classification_type field has been renamed into RECURRING_AREA in the
device_location_uplevelled
andstop_uplevelled
tables.Fully revamped the
geography_registry
table as following:- Geographies from the whole world instead of only US.
- Different naming convention in the partition key geography_type_code: block_group → admin4.
- New types of geographies not related to the usual census division for the US: csa, cbsa, dma, timezone.
- New format to represent the admin4:
- Old format (e.g. US.CA.037.060374082122): concatenation of admin2 code (US.CA.037) and the fips code (060374082122)
- New format (e.g US.CA.037.408212.2): concatenation of admin2 code (US.CA.037) and the remaining part of the fips code, obtained after removing the first 5 digits referring to the admin2, splitted by a dot after 6 digits (408212.2).
This has been done to make this representation consistent with the geography_ids of the rest of the World, represented by hierarchical sequences of characters separated by a dot symbol.
- Extended coastlines: part of the sea has been assigned to the neighbouring geographies so to both simplify them and to be able to assign points in the sea (at least those close to the coast) to a certain country.
- Addition of columns calculated based on GHS data, i.e. the geography population centroid coordinates (centroid_lat, centroid_lng).
- Removal of the following columns: geography_id_2, geometry_projection, geometry_geojson, geometry_wkb, census_year.
All the core data assets inherited from mobility data now refer to the new geography table,
hence a new notation for the Census block-group has been adopted.
Fully revamped the
hw_with_tag
table storing info about Home-Work as following;- Changed the table name into
device_recurring_area
- Added a
Confidence
column taking values in 0 to 1, to report the confidence of the assignment - Refresh the data every day (refer to the
snapshot_event_date
column to know the event date the record is referring to) and not anymore on a weekly basis - Improved algorithm: please refer to the dedicated tutorial to deep-dive into the matter.
- Changed the table name into
Added 5 speed columns in the
trajectory
tables and renamed max_speed column name into speed_gps_ms_max. Therefore the full list of speed columns is- speed_kinematic_ms_min
- speed_kinematic_ms_avg
- speed_kinematic_ms_max
- speed_gps_ms_min
- speed_gps_ms_avg
- speed_gps_ms_max
Modified the
stoppers_hll_by_geohash
table providing the total distinct devices information with daily and hourly granularity (and not anymore with monthly granularity).
Removed¶
- Removed the
ip
column from thedevice_location_uplevelled
table. - Removed the
lastseen_unixdatetime
column from both thedevice_location
tables. - Added the column
os_name
in all thedevice_location
,stop
andvisit
tables. - Deprecated the
trip
table. (Thetrip
table remains available in thepaas_mi
schema). - Deprecated the
device_features
table.
WORKBENCH HELP TUTORIALS
Changed¶
- The tutorials dedicated to
device_location
,stop
andvisit
core data assets have been modified to reflect the new structure in thepaas_cda_v3
schema.
Here the main points of attention but please do refer to the tutorial for details:
- The table has 3 partition keys: provider_id, processing_date, country_code, where provider_id is new and stands for the ID of the data provider.
- The delay in ingesting data is different per data provider – see section 4.1 How to fetch data of one specific event date of the tutorial dedicated to Device Location.
- The tutorial dedicated to the Home-work core data asset – now renamed as
device_recurring_area
– has been modified to reflect the new structure in thepaas_cda_v3
schema and to show the detail of the new algorithm. - The tutorial dedicated to
geography_registry
core data asset has been modified to reflect the new structure in thepaas_cda_v3
schema. - All the tutorials using the core data assets mentioned above have been adjusted accordingly. This involves both the ExploreTheCatalog/CoreDataAssets section and the UseCases section.
March 31, 2022¶
DATA CATALOG
Added¶
Added the following tables to the schema
paas_cda_eu_v2
andpaas_cda_v2
:stoppers_metrics_by_geohash
stoppers_metrics_by_bing_tiles
stoppers_metrics_by_h3
stoppers_hll_by_geohash
stoppers_hll_by_bing_tiles
stoppers_hll_by_h3
Add the
paas_cda_eu.trajectory
table
WORKBENCH HELP TUTORIALS
Added¶
- Added the Density tables tutorial in the ExploreTheCatalog/CoreDataAssets section
Changed¶
- Every tutorial notebook has been reviewed and rebranded (from Cuebiq to Spectus)
December 14, 2021¶
DATA CATALOG
New¶
In this update, we are releasing a new major version of several datasets. Before providing the specifics of the new major versions, we would like to share our philosophy behind the approach.
At Cuebiq, we strive to continuously evolve and improve our data. To ensure that users have the opportunity to evaluate improvements and migrate to new datasets with minimal disruption, we will release new major versions at the schema level. We will keep previous versions of a schema available for a period of time after we release a new version. Any changes from one version to the next will be documented in this changelog and the Cuebiq Data Catalog. Users will be given plenty of advance notice before older versions are deprecated to give you time to plan and migrate to newer versions.
A new schema version will be released when we make changes in the underlying data, remove tables or columns, or otherwise introduce breaking changes. In a new schema version, some tables may be exact copies from the previous version while others will meaningfully change. Any changes between schema versions will be clearly communicated and documented. Regardless of any changes between versions, you can be certain that tables in a given schema version will be compatible with each other (e.g., the same column present in multiple tables within a schema version will be consistent, enabling you to join across tables). If you have access to a given schema, you will automatically receive access to all supported schema versions.
We will always strive to have as few schema versions as possible to simplify users' workflows and will never have more than three versions available concurrently. When we release a new version, the format of the schema will be schema_name_v<#>
where the # represents the version number. For a new version, we will simply increment the number up by 1. For example, the original version of our Core Data Assets schema is paas_cda
, the next version will be paas_cda_v2
, the following version will be paas_cda_v3
and so on.
We hope this approach suits your needs. If you have any questions or feedback, please do not hesitate to contact your Cuebiq representative.
Added¶
- Added a new version of the
paas_cda
schema namedpaas_cda_v2
- Added a new version of the
paas_cda_eu
schema namedpaas_cda_eu_v2
- Added a new version of the
paas_samples
schema namedpaas_samples_v2
- Added a
block_group_id
column topaas_cda_v2.stop
,paas_cda_v2.stop_uplevelled
,paas_cda_eu_v2.stop
andpaas_cda_eu_v2.stop_uplevelled
tables - Added a
distributor_flag
column topaas_cda_v2.poi
andpaas_cda_v2.poi_history
tables - Added a new table named
paas_cda_v2.trajectory
- Added a new table named
paas_samples_v2.music_lovers
Changed¶
- Improved the visit algorithm being used to populate
paas_cda_v2.visit
- Added the
trip
table topaas_cda_v2
schema to be better aligned with the grouping of tables in our data catalog
** The paas_mi.trip
will continue to be supported until further notice
- Deprecated the table
paas_meas.ltv_impression
WORKBENCH HELP TUTORIALS
Added¶
- Added the Trino UDFs tutorial to the AdvancedTopics section
- Added the Travelers Analysis to the UseCases section
- Added the Commuters Analysis to the UseCases section
September 16, 2021¶
DATA CATALOG
Changed¶
- Updated the to following colulmns to both the
paas_cda.stop
and thepaas_cda_eu.stop
tables:- admin1_id
- admin2_id
WORKBENCH HELP TUTORIALS
Added¶
- Added the ___Workbench Help Tutorials Index___
- Added the Visit Analytics to the UseCases section
- Added the Visitor Personas to the UseCases section
Changed¶
- Updated the POIs tutorial inside the ExploreTheCatalog section to add details about the new POI table available
September 1, 2021¶
DATA CATALOG
Added¶
- Added the following tables:
paas_cda.stop_uplevelled
paas_cda_eu.stop_uplevelled
WORKBENCH HELP TUTORIALS
Added¶
- Added the Home Switchers Customisation to the UseCases section
- Added the Measurement Custom Analytics to the UseCases section
Changed¶
- Updated the Cuebiq Mobility and Visit Index tutorial inside the ExploreTheCatalog section to add details about how those CI are computed and how you can customise them. The tutorial has been split in two: a dedicated notebook is available for each index.
- Updated the How to Manage Tables tutorial in the GettingStarted section: find here how to control the size of the files you write through Trino.
- Updated the Cuebiq Data Catalog HTML file
August 13, 2021¶
DATA CATALOG
Added¶
- Added a new segment values for income range in the following tables:
paas_cda.segment_taxonomy
paas_cda.device_user_labeling
Changed¶
- Changed the date column name in
paas_cda.device_metrics
from processing_date to local_date
since the date filed is not the processing date but the actual date of reference.
Removed¶
- Removed the
paas_samples.visited_admin2
table
WORKBENCH HELP TUTORIALS
Added¶
- Added the Evacuation Rates Analysis to the UseCases section
- Added the Bias in Cuebiq Data Analysis to the UseCases section
- Added the Cuebiq Contact Index Customization tutorial to the UseCases section
Changed¶
- Updated the Socio-demographic Dataset Exploration tutorial to the ExploreTheCatalog/CoreDataAssets section
- Updated the Cuebiq Data Catalog HTML file
July 29, 2021¶
DATA CATALOG
Added¶
- Added the new schema
paas_public_data
- Added the
paas_public_data.census_taxonomy
table - Added the
paas_public_data.census_data
table - Added the
paas_cda.cbsa
table - Added the
paas_samples.customers_results
table - Added the
paas_samples.customers_models
table
WORKBENCH HELP TUTORIALS
Added¶
- Added the Customer Analysis to the UseCases section
- Added the Path Analysis to the UseCases section
- Added the Socio-demographic Dataset Exploration tutorial to the ExploreTheCatalog/CoreDataAssets section
- Added the PythonCartoFrame tutorial to the VisualisationToolkit section
Changed¶
- Updated the Trade Area Analysis in the UseCases section
- Updated the Manage Tables tutorial in the GettingStarted section
- Updated every tutorial notebook according to the migration to the Trino query engine (from Presto)
- Updated the Cuebiq Data Catalog HTML file
- Updated the Open Source Notices and Disclaimers document
Removed¶
- Removed the
explore-the-app-gallery
directory: it has been merged into theuse-cases
folder
June 29, 2021¶
DATA CATALOG
Added¶
- Added the os_version column to the following tables:
paas_cda.device_location
paas_cda.device_location_uplevelled
paas_cda_eu.device_location
paas_cda_eu.device_location_uplevelled
- Added the following columns to the
paas_cda.poi
table:- place_open_hours
- place_opening_date
- place_closing_date
- Added the
paas_samples.path_analysis_covisits
table - Added the
paas_samples.visited_admin2
table - Added the
paas_meas.scale_factor_ooh
table
Changed¶
- Updated the
paas_cda.segment_taxonomy
table to filter out old segement values
Removed¶
- Removed the organization_id column from the
paas_cda.poi
table - Removed the following columns from the
paas_mi.home_switchers
table:- bottom10_home_switcher_pct
- home_switcher_pct
- top10_home_switcher_pct
WORKBENCH HELP TUTORIALS
Added¶
- Added the Trade Area Analysis to the UseCases section
- Added the Home Switchers Dataset Exploration tutorial to the ExploreTheCatalog/Mobility section
- Added the PySpark Serverless tutorial to the AdvancedTopics section
Changed¶
- Updated the Cuebiq Data Catalog HTML file
- Updated the POIs Dataset Exploration tutorial in the ExploreTheCatalog/CoreDataAssets section