Spectus DataCatalog and Spectus DataCleanRoom Help Tutorials Changelog¶
This document will be a changelog to capture updates that we make both to the Spectus Data Catalog and the Spectus Data Clean Room Help Tutotials.
We strive to make changes in periodic batches and provide visibility to users about those changes.
May 10, 2023¶
DATA CATALOG
Added¶
- Added the table
stopinto the schemavehicle_v1.
WORKBENCH HELP TUTORIALS
Added¶
- The new tutorial Stops dedicated to the newly released asset has been added. With this tutorial the vehicle stop table is presented and some use cases enabled by this asset are reported.
This involves the ExploreTheCatalog/VehicleDataAssets section.
- The new tutorial Monthly Average Daily Traffic, describing some methods to compute one among the most useful metrics when performing traffic analysis over a roadway of interest: the Monthly Average Daily Traffic (MADT)
This involves the UseCases section.
Changed¶
- The tutorial Traffic Analysis has been updated to include the computation of the Monthly Average Daily Traffic with one of the methods presented in the ad hoc tutorial, over the specific road under analysis.
This involves the UseCases section.
March 20, 2023¶
WORKBENCH HELP TUTORIALS
Added¶
- The new tutorial Traffic Analysis dedicated to some of the most common traffic analyses has been added. With this tutorial the value of the vehicle assets is opened up and tangible support to actually implement well-known traffic use cases is provided.
This involves the UseCases section.
January 25, 2023¶
DATA CATALOG
Changed¶
The table vehicle_v1.trajectory has been modified as following;
Added the new fields:
- trajectory_wkt
- start_block_group_id and end_block_group_id
- max_time_gap_seconds
- trajectory_wkt
- bounding_box_diagonal_meters
Removed noisy and extremely short trajectories.
WORKBENCH HELP TUTORIALS
Changed¶
- The tutorial dedicated to the vehicle
trajectoryasset has been modified introducing the new fields and possible use cases enabled by them.
This involves the ExploreTheCatalog/VehicleDataAssets section.
December 22, 2022¶
DATA CATALOG
New¶
You've got the chance to get used to the new vehicle_location asset playing around with the sample paas_samples_v2.vehicle_location: now, the 2 final assets vehicle_location and trajectory have been added to the newly created vehicle_v1 schema.
Together with the first brick of any further aggregated dataset built with connected vehicles mobility data, here also the table made to work with the journeys each vehicle is making every day.
Added¶
- Added the new schema
vehicle_v1storing the following tables:vehicle_locationtrajectory
WORKBENCH HELP TUTORIALS
Added¶
- The tutorials dedicated to the new assets
vehicle_locationand vehicletrajectoryhave been added.
This involves the ExploreTheCatalog/VehicleDataAssets section.
December 20, 2022¶
DATA CATALOG
Added¶
- Added a new table in the sample schema:
paas_samples_v2.probabilistic_stoppers_hll_by_geohash
WORKBENCH HELP TUTORIALS
Changed¶
- The tutorial of the Densities have been updated to describe these new asset, having the same structure of the
paas_cda_v3.stoppers_hll_by_geohash, and providing quantitative test results of the comparison between the probabilistic and the deterministic densities.
December 14, 2022¶
DATA CATALOG
New¶
A new kind of data is entering the platform. After an important collection of assets built with mobile device mobility data, we now present the first brick of a new set of core data assets: a sample of the incoming vehicle_location asset, namely the 'equivalent' of device_location but starting from mobility data of connected vehicles.
Added¶
- Added a new table in the sample schema:
paas_samples_v2.vehicle_location
WORKBENCH HELP TUTORIALS
Added¶
- The tutorial dedicated to this new asset
vehicle_locationhas been added and it is including a brief presentation of the sample, the crucial aspects of the new asset and a set of initial interactions for a deeper knowledge. This involves the ExploreTheCatalog/CoreDataAssets section.
October 19, 2022¶
WORKBENCH HELP TUTORIALS
Changed¶
- Four tutorials in the Use Cases section have been updated to enable users to extract mobility insights easily and autonomously.
In details:
- Mobility index (previously in ExploreTheCatalog/CoreDataAsset section)
- Contact index
- Visit index (previously in ExploreTheCatalog/CoreDataAsset section)
- Relocation analysis
Added¶
- One tutorial has been added to explain how to compute flows between polygons of interest:
- Traceability analysis
September 13, 2022¶
DATA CATALOG
Added¶
- Added a new table named
paas_cda_v3.trajectory_uplevelled
WORKBENCH HELP TUTORIALS
Changed¶
- The tutorials dedicated to the
trajectorycore data asset has been modified to include a section describing the new asset. This involves the ExploreTheCatalog/CoreDataAssets section.
Here the main points of attention but please do refer to the tutorial for details:
- The uplevelled version of the trajectory table has two additional columns classifying the trajectory in terms of privacy: start_classification_type and end_classification_type.
- If the starting or the ending point concern recurrig areas, the full trajectory is privatized.
- The fields impacted by the privatization are start/end_lat, start/end_lng, start/end_geohash, length_meters, trajectory_wkt.
July 20, 2022¶
DATA CATALOG
New¶
In this update we are releasing a new version of the Core Data Assets schema (paas_cda_v3) made to hold the main Core Data Assets tables with an additional layer of partition: the provider of the data.
Added¶
- Added the new schema
paas_cda_v3storing the following tables:device_locationanddevice_location_uplevelledstopandstop_uplevelledvisitdevice_recurring_areadevice_metricsanddevice_metrics_uplevelledgeography_registryandcbsapoiandpoi_historycustom_poiandcustom_poi_historybranddevice_user_labelingandsegment_taxonomytrajectorystoppers_hll_by_geohash.
From now on the EU data will be included in this schema, therefore the schema
paas_cda_eu_v3won’t be part of this release, nor the following ones.
Changed¶
We report here the main changes to existing Core Data Assets tables brought by this paas_cda_v3 schema release.
All the core data assets inherited from mobility data are now characterized by the additional dimension of the data provider, made available via the provider_id column. Here is listed the full set of involved CDAs:
device_locationanddevice_location_uplevelledstopandstop_uplevelledvisitdevice_recurring_areatrajectorystoppers_hll_by_geohash
The value PERSONAL_AREA of the classification_type field has been renamed into RECURRING_AREA in the
device_location_uplevelledandstop_uplevelledtables.Fully revamped the
geography_registrytable as following:- Geographies from the whole world instead of only US.
- Different naming convention in the partition key geography_type_code: block_group → admin4.
- New types of geographies not related to the usual census division for the US: csa, cbsa, dma, timezone.
- New format to represent the admin4:
- Old format (e.g. US.CA.037.060374082122): concatenation of admin2 code (US.CA.037) and the fips code (060374082122)
- New format (e.g US.CA.037.408212.2): concatenation of admin2 code (US.CA.037) and the remaining part of the fips code, obtained after removing the first 5 digits referring to the admin2, splitted by a dot after 6 digits (408212.2).
This has been done to make this representation consistent with the geography_ids of the rest of the World, represented by hierarchical sequences of characters separated by a dot symbol.
- Extended coastlines: part of the sea has been assigned to the neighbouring geographies so to both simplify them and to be able to assign points in the sea (at least those close to the coast) to a certain country.
- Addition of columns calculated based on GHS data, i.e. the geography population centroid coordinates (centroid_lat, centroid_lng).
- Removal of the following columns: geography_id_2, geometry_projection, geometry_geojson, geometry_wkb, census_year.
All the core data assets inherited from mobility data now refer to the new geography table,
hence a new notation for the Census block-group has been adopted.
Fully revamped the
hw_with_tagtable storing info about Home-Work as following;- Changed the table name into
device_recurring_area - Added a
Confidencecolumn taking values in 0 to 1, to report the confidence of the assignment - Refresh the data every day (refer to the
snapshot_event_datecolumn to know the event date the record is referring to) and not anymore on a weekly basis - Improved algorithm: please refer to the dedicated tutorial to deep-dive into the matter.
- Changed the table name into
Added 5 speed columns in the
trajectorytables and renamed max_speed column name into speed_gps_ms_max. Therefore the full list of speed columns is- speed_kinematic_ms_min
- speed_kinematic_ms_avg
- speed_kinematic_ms_max
- speed_gps_ms_min
- speed_gps_ms_avg
- speed_gps_ms_max
Modified the
stoppers_hll_by_geohashtable providing the total distinct devices information with daily and hourly granularity (and not anymore with monthly granularity).
Removed¶
- Removed the
ipcolumn from thedevice_location_uplevelledtable. - Removed the
lastseen_unixdatetimecolumn from both thedevice_locationtables. - Added the column
os_namein all thedevice_location,stopandvisittables. - Deprecated the
triptable. (Thetriptable remains available in thepaas_mischema). - Deprecated the
device_featurestable.
WORKBENCH HELP TUTORIALS
Changed¶
- The tutorials dedicated to
device_location,stopandvisitcore data assets have been modified to reflect the new structure in thepaas_cda_v3schema.
Here the main points of attention but please do refer to the tutorial for details:
- The table has 3 partition keys: provider_id, processing_date, country_code, where provider_id is new and stands for the ID of the data provider.
- The delay in ingesting data is different per data provider – see section 4.1 How to fetch data of one specific event date of the tutorial dedicated to Device Location.
- The tutorial dedicated to the Home-work core data asset – now renamed as
device_recurring_area– has been modified to reflect the new structure in thepaas_cda_v3schema and to show the detail of the new algorithm. - The tutorial dedicated to
geography_registrycore data asset has been modified to reflect the new structure in thepaas_cda_v3schema. - All the tutorials using the core data assets mentioned above have been adjusted accordingly. This involves both the ExploreTheCatalog/CoreDataAssets section and the UseCases section.
March 31, 2022¶
DATA CATALOG
Added¶
Added the following tables to the schema
paas_cda_eu_v2andpaas_cda_v2:stoppers_metrics_by_geohashstoppers_metrics_by_bing_tilesstoppers_metrics_by_h3stoppers_hll_by_geohashstoppers_hll_by_bing_tilesstoppers_hll_by_h3
Add the
paas_cda_eu.trajectorytable
WORKBENCH HELP TUTORIALS
Added¶
- Added the Density tables tutorial in the ExploreTheCatalog/CoreDataAssets section
Changed¶
- Every tutorial notebook has been reviewed and rebranded (from Cuebiq to Spectus)
December 14, 2021¶
DATA CATALOG
New¶
In this update, we are releasing a new major version of several datasets. Before providing the specifics of the new major versions, we would like to share our philosophy behind the approach.
At Cuebiq, we strive to continuously evolve and improve our data. To ensure that users have the opportunity to evaluate improvements and migrate to new datasets with minimal disruption, we will release new major versions at the schema level. We will keep previous versions of a schema available for a period of time after we release a new version. Any changes from one version to the next will be documented in this changelog and the Cuebiq Data Catalog. Users will be given plenty of advance notice before older versions are deprecated to give you time to plan and migrate to newer versions.
A new schema version will be released when we make changes in the underlying data, remove tables or columns, or otherwise introduce breaking changes. In a new schema version, some tables may be exact copies from the previous version while others will meaningfully change. Any changes between schema versions will be clearly communicated and documented. Regardless of any changes between versions, you can be certain that tables in a given schema version will be compatible with each other (e.g., the same column present in multiple tables within a schema version will be consistent, enabling you to join across tables). If you have access to a given schema, you will automatically receive access to all supported schema versions.
We will always strive to have as few schema versions as possible to simplify users' workflows and will never have more than three versions available concurrently. When we release a new version, the format of the schema will be schema_name_v<#> where the # represents the version number. For a new version, we will simply increment the number up by 1. For example, the original version of our Core Data Assets schema is paas_cda, the next version will be paas_cda_v2, the following version will be paas_cda_v3 and so on.
We hope this approach suits your needs. If you have any questions or feedback, please do not hesitate to contact your Cuebiq representative.
Added¶
- Added a new version of the
paas_cdaschema namedpaas_cda_v2 - Added a new version of the
paas_cda_euschema namedpaas_cda_eu_v2 - Added a new version of the
paas_samplesschema namedpaas_samples_v2 - Added a
block_group_idcolumn topaas_cda_v2.stop,paas_cda_v2.stop_uplevelled,paas_cda_eu_v2.stopandpaas_cda_eu_v2.stop_uplevelledtables - Added a
distributor_flagcolumn topaas_cda_v2.poiandpaas_cda_v2.poi_historytables - Added a new table named
paas_cda_v2.trajectory - Added a new table named
paas_samples_v2.music_lovers
Changed¶
- Improved the visit algorithm being used to populate
paas_cda_v2.visit - Added the
triptable topaas_cda_v2schema to be better aligned with the grouping of tables in our data catalog
** The paas_mi.trip will continue to be supported until further notice
- Deprecated the table
paas_meas.ltv_impression
WORKBENCH HELP TUTORIALS
Added¶
- Added the Trino UDFs tutorial to the AdvancedTopics section
- Added the Travelers Analysis to the UseCases section
- Added the Commuters Analysis to the UseCases section
September 16, 2021¶
DATA CATALOG
Changed¶
- Updated the to following colulmns to both the
paas_cda.stopand thepaas_cda_eu.stoptables:- admin1_id
- admin2_id
WORKBENCH HELP TUTORIALS
Added¶
- Added the ___Workbench Help Tutorials Index___
- Added the Visit Analytics to the UseCases section
- Added the Visitor Personas to the UseCases section
Changed¶
- Updated the POIs tutorial inside the ExploreTheCatalog section to add details about the new POI table available
September 1, 2021¶
DATA CATALOG
Added¶
- Added the following tables:
paas_cda.stop_uplevelledpaas_cda_eu.stop_uplevelled
WORKBENCH HELP TUTORIALS
Added¶
- Added the Home Switchers Customisation to the UseCases section
- Added the Measurement Custom Analytics to the UseCases section
Changed¶
- Updated the Cuebiq Mobility and Visit Index tutorial inside the ExploreTheCatalog section to add details about how those CI are computed and how you can customise them. The tutorial has been split in two: a dedicated notebook is available for each index.
- Updated the How to Manage Tables tutorial in the GettingStarted section: find here how to control the size of the files you write through Trino.
- Updated the Cuebiq Data Catalog HTML file
August 13, 2021¶
DATA CATALOG
Added¶
- Added a new segment values for income range in the following tables:
paas_cda.segment_taxonomypaas_cda.device_user_labeling
Changed¶
- Changed the date column name in
paas_cda.device_metricsfrom processing_date to local_date
since the date filed is not the processing date but the actual date of reference.
Removed¶
- Removed the
paas_samples.visited_admin2table
WORKBENCH HELP TUTORIALS
Added¶
- Added the Evacuation Rates Analysis to the UseCases section
- Added the Bias in Cuebiq Data Analysis to the UseCases section
- Added the Cuebiq Contact Index Customization tutorial to the UseCases section
Changed¶
- Updated the Socio-demographic Dataset Exploration tutorial to the ExploreTheCatalog/CoreDataAssets section
- Updated the Cuebiq Data Catalog HTML file
July 29, 2021¶
DATA CATALOG
Added¶
- Added the new schema
paas_public_data - Added the
paas_public_data.census_taxonomytable - Added the
paas_public_data.census_datatable - Added the
paas_cda.cbsatable - Added the
paas_samples.customers_resultstable - Added the
paas_samples.customers_modelstable
WORKBENCH HELP TUTORIALS
Added¶
- Added the Customer Analysis to the UseCases section
- Added the Path Analysis to the UseCases section
- Added the Socio-demographic Dataset Exploration tutorial to the ExploreTheCatalog/CoreDataAssets section
- Added the PythonCartoFrame tutorial to the VisualisationToolkit section
Changed¶
- Updated the Trade Area Analysis in the UseCases section
- Updated the Manage Tables tutorial in the GettingStarted section
- Updated every tutorial notebook according to the migration to the Trino query engine (from Presto)
- Updated the Cuebiq Data Catalog HTML file
- Updated the Open Source Notices and Disclaimers document
Removed¶
- Removed the
explore-the-app-gallerydirectory: it has been merged into theuse-casesfolder
June 29, 2021¶
DATA CATALOG
Added¶
- Added the os_version column to the following tables:
paas_cda.device_locationpaas_cda.device_location_uplevelledpaas_cda_eu.device_locationpaas_cda_eu.device_location_uplevelled
- Added the following columns to the
paas_cda.poitable:- place_open_hours
- place_opening_date
- place_closing_date
- Added the
paas_samples.path_analysis_covisitstable - Added the
paas_samples.visited_admin2table - Added the
paas_meas.scale_factor_oohtable
Changed¶
- Updated the
paas_cda.segment_taxonomytable to filter out old segement values
Removed¶
- Removed the organization_id column from the
paas_cda.poitable - Removed the following columns from the
paas_mi.home_switcherstable:- bottom10_home_switcher_pct
- home_switcher_pct
- top10_home_switcher_pct
WORKBENCH HELP TUTORIALS
Added¶
- Added the Trade Area Analysis to the UseCases section
- Added the Home Switchers Dataset Exploration tutorial to the ExploreTheCatalog/Mobility section
- Added the PySpark Serverless tutorial to the AdvancedTopics section
Changed¶
- Updated the Cuebiq Data Catalog HTML file
- Updated the POIs Dataset Exploration tutorial in the ExploreTheCatalog/CoreDataAssets section