# Setting up a dataset on Nuvolos

This page explains how to create datasets with specific access control settings for your team on Nuvolos.

As a short summary, it is possible to create datasets on Nuvolos:

* That are visible to all users in an organisation,
* that are visible only to faculty users of an organisation,
* or just visible to invited users.

## Setting up a dataset space

**Required role: organization faculty**

Create a dataset space by selecting New dataset when creating a new space. Any user with the organization faculty role can create a dataset space.

{% hint style="info" %}
Dataset spaces are special: you **cannot** run applications in them. The best way to populate a dataset space is to distribute to it.
{% endhint %}

During setup, choose the visibility of the dataset space.

* A **private** space is visible only to users who are explicitly invited, which is the default setting.
* A **faculty-only** space is visible to all faculty users in the organization.
* A **public** space is visible to all users in the organization. Public visibility does not automatically grant access to the data, but it does make users aware that the dataset space exists.

If the visibility options are not shown immediately, expand them in the space creation form with the toggle.

## Create your dataset

**Required role: organization faculty / space administrator in existing space**

Dataset spaces hold static information. In order to generate the dataset, we suggest you set up a regular research space where you execute a data pipeline and perform analytical and transformative steps to arrive at the final state of data you want to then store.

In order to see what tools are available for data pipelines, please refer to this guide.

## Distribute your data to the dataset space

**Required role: editor in appropriate instance of dataset space**

Once your pipeline is finished, the artefacts you want to store are available. Make sure to [distribute your data](/features/nuvolos-basic-concepts/distribution.md) (either tables, or files or a combination of the two) to the dataset space. You may want to distribute an app as well which contains a blueprint or a software library that facilitates interaction with your data - however the app will not be able to run in the dataset space.

{% hint style="info" %}
If you are doing regular updates to the dataset, we suggest cleaning up the current state before distributing to make sure that the next data vintage is completely clean from previous artefacts.
{% endhint %}

## Create a snapshot and name it

**Required role: editor in appropriate instance of dataset space**

Once the distribution is completed, [create a new snapshot ](/features/snapshots/create-a-snapshot.md)of the dataset space. We generally suggest to create a named snapshot with a full description of the circumstances of the snapshot creation. Datasets generated by the Nuvolos team always name these snapshots *vintages* to highlight the fact that the same dataset may evolve during time.

## How public datasets work

Public datasets are visible to all members of an organization, but users do not automatically receive access to their contents. Instead, they are initially granted the observer role.

To access the contents of a public dataset space, users must request the viewer role. They can do this by opening the public dataset space and submitting an access request.

Once the request is submitted, the manager of the organisation needs to [review and accept the request](https://docs.nuvolos.cloud/administration/organisation-management#review-requests).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.nuvolos.com/user-guides/data-guides/setting-up-a-dataset-on-nuvolos.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
