Import data

How do you import data onto Nuvolos?

Outcome You bring external data into Nuvolos using the import method that matches the data source.

Before you start

  • You know where the data currently lives (local files, external storage, a public URL, or another database).

  • You have decided whether the data will land in the Nuvolos file system, the data warehouse, or Large File Storage.

Where the data will land

Three storage targets are available in Nuvolos. Pick the one that matches your data size and usage pattern:

  • Nuvolos file system - for flat files (CSV, Parquet, code, documents). Suitable for the 1 GB–100 GB range.

  • Scientific Data Warehouse - for tabular data that benefits from SQL queries and joins. Suitable for datasets where row-level access patterns matter.

  • Large File Storage - for very large files (above 100 GB) that do not change often. See Reference › File Storage.

Data pipelines (ETL)

A data pipeline is a chain of processes that extracts data from a source and stores it in a target. Typical sources include files over the internet, web-scraped pages, or existing databases.

For running data pipelines on Nuvolos, we recommend the Apache Airflow application. Airflow lets you build complex workflows as directed acyclic graphs, mixing script languages and command-line operations, with built-in monitoring, failover, and scheduling.

For Airflow best practices, see the Apache Airflow documentation. For application-specific guidance on running Airflow on Nuvolos, see Reference › Applications.

File uploads

Two paths for getting files into Nuvolos:

  • The Files view in the web UI - best for individual files up to a few hundred MB.

  • Application-specific upload UIs - sometimes more suitable for larger files. For example, RStudio and JupyterLab both have built-in upload features that handle interrupted uploads better than the web UI for very large files.

Download directly from a URL

If the data you need is available at a public URL, do not download it to your computer first. Use wget (available in every Nuvolos application) from a terminal:

See the wget documentation for options.

Mount external storage

Many cloud storage services can be mounted as folders inside your applications. The supported list includes Amazon S3, Azure Files, Dropbox, Google Drive, Box, Mega, and SharePoint Online. Mounting avoids the need to copy data into Nuvolos - the data stays in its origin storage and is accessed live by applications.

For the full list of mountable services, see the rclone overview. For the specific Nuvolos integrations, see Reference › File Storage - the connector pages cover Dropbox sync, S3 buckets via rclone, SSHFS, and SharePoint.

Detailed guidance on some relevant, non-trivial use-cases:

Last updated

Was this helpful?