Setting up a dataset on Nuvolos
This page explains how to create datasets with specific access control settings for your team on Nuvolos.
As a short summary, it is possible to create datasets on Nuvolos:
That are visible to all users in an organisation,
that are visible only to faculty users of an organisation,
or just visible to invited users.
Setting up a dataset space
Required role: organization faculty
Create a dataset space by selecting New dataset when creating a new space. Any user with the organization faculty role can create a dataset space.
Dataset spaces are special: you cannot run applications in them. The best way to populate a dataset space is to distribute to it.
During setup, choose the visibility of the dataset space.
A private space is visible only to users who are explicitly invited, which is the default setting.
A faculty-only space is visible to all faculty users in the organization.
A public space is visible to all users in the organization. Public visibility does not automatically grant access to the data, but it does make users aware that the dataset space exists.
If the visibility options are not shown immediately, expand them in the space creation form with the toggle.
Create your dataset
Required role: organization faculty / space administrator in existing space
Dataset spaces hold static information. In order to generate the dataset, we suggest you set up a regular research space where you execute a data pipeline and perform analytical and transformative steps to arrive at the final state of data you want to then store.
In order to see what tools are available for data pipelines, please refer to this guide.
Distribute your data to the dataset space
Required role: editor in appropriate instance of dataset space
Once your pipeline is finished, the artefacts you want to store are available. Make sure to distribute your data (either tables, or files or a combination of the two) to the dataset space. You may want to distribute an app as well which contains a blueprint or a software library that facilitates interaction with your data - however the app will not be able to run in the dataset space.
If you are doing regular updates to the dataset, we suggest cleaning up the current state before distributing to make sure that the next data vintage is completely clean from previous artefacts.
Create a snapshot and name it
Required role: editor in appropriate instance of dataset space
Once the distribution is completed, create a new snapshot of the dataset space. We generally suggest to create a named snapshot with a full description of the circumstances of the snapshot creation. Datasets generated by the Nuvolos team always name these snapshots vintages to highlight the fact that the same dataset may evolve during time.
How public datasets work
Public datasets are visible to all members of an organization, but users do not automatically receive access to their contents. Instead, they are initially granted the observer role.
To access the contents of a public dataset space, users must request the viewer role. They can do this by opening the public dataset space and submitting an access request.
Once the request is submitted, the manager of the organisation needs to review and accept the request.
Last updated
Was this helpful?