> For the complete documentation index, see [llms.txt](https://docs.nuvolos.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.nuvolos.com/reference/applications/application-resources/fractional-gpus-on-nuvolos.md).

# Fractional GPUs on Nuvolos

## GPU Slicing on Nuvolos: Device sharing and resource isolation

Allocating a full GPU to a single user is often inefficient. In teaching and foundational research, a workload might only use a fraction of a device's actual capacity, leaving the rest of the resource idle while the institution incurs full costs.

Nuvolos GPU Slicing splits a single physical device into multiple isolated devices, going beyond solutions such as NVIDIA MiG. This allows multiple users to share the same hardware simultaneously without interfering with each other's work.

### How It Works For You

#### 1. For Students

* Dedicated Resources: Your Application runs on a dedicated, isolated computing device (GPU, NPU, or other accelerators), with a fixed allocation of the device’s capacity, i.e., cores and video  memory (VRAM).
* Workload Isolation: Your compute environment is sandboxed. If another user on the same physical card runs code that crashes or overloads their system, your workspace remains stable and unaffected.

#### 2. For Educators & TAs

* **Predictable scaling**: Deploy uniform Applications with dedicated, isolated computing devices (GPUs, NPUs, or other accelerators) to 10 or 150+ students simultaneously. Resources are sourced by Nuvolos from geo-diverse locations, ensuring simultaneous availability of the desired computing devices to all participants.
* **Consistent environments**: Every student operates on the identical hardware and software profile, eliminating environment-related troubleshooting during labs.
* **Cost constraints**: Set automated runtime limits and per-student credit quotas. If an Application is left idle over a weekend, Nuvolos automatically shuts it down to protect your budget within a time window configurable by you.

#### 3. For Research

* **Higher resource utilization**: Run up to eight concurrent workloads on a single physical device during active course hours, maximizing the return on your hardware investment.
* **Provider independent**: The Nuvolos control plane sits above the infrastructure layer. The end-user experience remains identical whether your compute resources run on Azure, SWITCH Cloud, or on-premise servers.
* **Clear telemetry**: Track usage metrics by user, department, or course to convert aggregate cloud bills into clear operational data.

### Traditional Allocations vs. Nuvolos GPU Slicing

| **Feature**          | **Traditional Allocation**                       | **Nuvolos GPU Slicing**                         |
| -------------------- | ------------------------------------------------ | ----------------------------------------------- |
| User Capacity        | 1 User per device                                | Up to 8 Users per device                        |
| Hardware Constraints | Often requires specific enterprise-grade devices | Fits across varied device tiers                 |
| Stability            | Shared environments risk full-node crashes       | Strict memory boundaries protect adjacent users |
| Cost Management      | High risk of idle runtime costs                  | Enforced auto-stop timers and credit limits     |

<figure><img src="/files/GhNrDYYsgwl59i8xXyXY" alt=""><figcaption><p>Traditional vs. Fractional device allocation</p></figcaption></figure>

> #### Behind the Scenes
>
> To deliver this, Nuvolos transforms raw bare-metal infrastructure into an expandable platform that operates uniformly across multiple cloud providers. Within this flexible cloud environment, we integrate a toolkit to dynamically slice physical devices, delivering isolated processing power straight to individual Applications regardless of where the underlying hardware physically sits.\
> \
> The system injects a custom scheduler into the Nuvolos control plane. When an Application container requests a fraction of a device, our runtime library intercepts memory allocations at the user-space boundary. If an Application attempts to exceed its assigned allocation, the library blocks the overflow, keeping the underlying node and all adjacent users completely stable.

### Join the Early Access Program

GPU Slicing is currently in beta testing to ensure stability, performance boundaries, and multi-cloud compatibility.

While this feature is not yet live in the standard Nuvolos dashboard, we are opening a waitlist for universities, research institutes, and educators who want to participate in our upcoming beta phases.

#### Why join the waitlist?

* Get early access to beta testing templates.
* Work with our engineering team to align the feature with your existing infrastructure (including major cloud providers and on-premise).
* Secure priority deployment for your upcoming course cohorts.

You can join the waitlist using the following link: [Waitlist Form](https://2f4lzm.share-eu1.hsforms.com/2ZIGzH7TCSBGvH8rKWhu6Nw)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.nuvolos.com/reference/applications/application-resources/fractional-gpus-on-nuvolos.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
