# Data integration

## What is data integration?

Nuvolos is not just an online computer lab — it is also a data platform. Data integration in Nuvolos ensures that educators and researchers can access and work with data efficiently as part of their regular workflow. Key capabilities include:

* **Native application access** — data can be queried directly from Nuvolos applications (Python, R, Stata, etc.).
* **External access** — data can be accessed from non-Nuvolos applications via standard connectors and tokens.
* **Scalable storage** — data is stored in a cloud-based, SQL-compliant data warehouse (the Scientific Data Warehouse, built on [Snowflake](https://www.snowflake.com)).
* **Access control** — access to licensed or sensitive data can be controlled at the organisation and space level.
* **Add-on databases** — applications can be extended with [add-on](https://docs.nuvolos.com/features/applications/add-ons) sidecar services such as PostgreSQL, MongoDB, MariaDB, Redis, Neo4j, OpenSearch, and PostGIS for use cases that require a dedicated database engine.
* **Standalone database servers** — you can run separate database server applications on Nuvolos and connect to them securely via Nuvolos networking. Depending on the application's visibility settings, a database server can be accessible within an instance, across an entire space, or organisation-wide. See [Connecting to apps from other applications](https://docs.nuvolos.com/applications/configuring-applications#connecting-to-apps-from-other-applications) for configuration details.

For details on connecting to data, uploading tables, and running queries, see [Database integration](https://docs.nuvolos.com/features/database-integration).

## Why is data integration useful?

Data integration enables workflows in research and education that go beyond simple file sharing.

### Data is accessible

Nuvolos makes it possible to fine-tune access to data at both the organisation and project level. Organisation-wide public data is viewable by any member, and restricted datasets can be made available only to specified users.

### Data is annotated

Nuvolos supports adding table and column descriptions to your data. This aids documentation and makes projects easier to reuse. You can add descriptions through the [Tables view](https://docs.nuvolos.com/features/database-integration/view-tables) or programmatically via SQL.

### Data is vintaged

You can maintain multiple point-in-time versions (*vintages*) of a dataset. For example, if a data provider updates a financial dataset quarterly, you can store each quarterly release as a separate vintage. This lets you reference the exact data that was available when you ran an analysis, which is essential for replicability. Vintages are created using the [snapshot feature](https://docs.nuvolos.com/features/snapshots) and stored in [dataset spaces](https://docs.nuvolos.com/features/database-integration/create-datasets).

### Data can be distributed

If you need data from a dataset in your own space, you can use [distribution](https://docs.nuvolos.com/features/nuvolos-basic-concepts/distribution) to copy only the tables and files you need, rather than duplicating entire datasets.
