> For the complete documentation index, see [llms.txt](https://docs.nuvolos.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.nuvolos.com/how-to-guides/workflows-for-researchers/importing-data-on-nuvolos.md).

# Import data

<mark style="color:$primary;">**Outcome**</mark>\
You bring external data into Nuvolos using the import method that matches the data source.

<mark style="color:$primary;">**Before you start**</mark>

* You know where the data currently lives (local files, external storage, a public URL, or another database).
* You have decided whether the data will land in the Nuvolos file system, the data warehouse, or Large File Storage.

#### Where the data will land

Three storage targets are available in Nuvolos. Pick the one that matches your data size and usage pattern:

* **Nuvolos file system** - for flat files (CSV, [Parquet](https://parquet.apache.org/), code, documents). Suitable for the 1 GB–100 GB range.
* **Scientific Data Warehouse** - for tabular data that benefits from SQL queries and joins. Suitable for datasets where row-level access patterns matter.
* **Large File Storage** - for very large files (above 100 GB) that do not change often. See [Reference › File Storage](/reference/file-system-and-storage/large-file-storage.md).

### Data pipelines (ETL)

A data pipeline is a chain of processes that extracts data from a source and stores it in a target. Typical sources include files over the internet, web-scraped pages, or existing databases.

For running data pipelines on Nuvolos, we recommend the [Apache Airflow](/how-to-guides/application-specific-guides/apache-airflow.md) application. Airflow lets you build complex workflows as directed acyclic graphs, mixing script languages and command-line operations, with built-in monitoring, failover, and scheduling.

For Airflow best practices, see the [Apache Airflow documentation](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html). For application-specific guidance on running Airflow on Nuvolos, see [Reference › Applications](/reference/applications.md).

### File uploads

Two paths for getting files into Nuvolos:

* **The Files view in the web UI** - best for individual files up to a few hundred MB.&#x20;
* **Application-specific upload UIs** - sometimes more suitable for larger files. For example, RStudio and JupyterLab both have built-in upload features that handle interrupted uploads better than the web UI for very large files.

### Download directly from a URL

If the data you need is available at a public URL, do not download it to your computer first. Use `wget` (available in every Nuvolos application) from a terminal:

{% code overflow="wrap" %}

```sql
wget <url>
```

{% endcode %}

See the [wget documentation](https://man7.org/linux/man-pages/man1/wget.1.html) for options.

### **Mount external storage**

Many cloud storage services can be mounted as folders inside your applications. The supported list includes Amazon S3, Azure Files, Dropbox, Google Drive, Box, Mega, and SharePoint Online. Mounting avoids the need to copy data into Nuvolos - the data stays in its origin storage and is accessed live by applications.

For the full list of mountable services, see the [rclone overview](https://rclone.org/overview/). For the specific Nuvolos integrations, see [Reference › File Storage](/reference/file-system-and-storage.md) - the connector pages cover Dropbox sync, S3 buckets via rclone, SSHFS, and SharePoint.

Detailed guidance on some relevant, non-trivial use-cases:

* [SSHFS](/reference/file-system-and-storage/access-remote-files-with-sshfs.md)
* [SharePoint Online](/reference/file-system-and-storage/access-files-on-sharepoint-online.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.nuvolos.com/how-to-guides/workflows-for-researchers/importing-data-on-nuvolos.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
