GPU Computation

Outcome You configure an application to use a GPU, install the right libraries, and verify the setup works.

Before you start

  • Your space has credit-based application sizes enabled. See Administration › Space management.

  • You have enough Credits in the resource pool mapped to the space.

  • You know which framework you will use - PyTorch, TensorFlow, XGBoost, or another.

GPU acceleration on Nuvolos requires two things:

  • A GPU-enabled size for the application. By default, applications run on nodes without GPUs. You scale the application to a GPU-enabled size - all of which are credit-based.

  • Properly configured libraries. The remainder of this guide covers per-framework setup so your Application actually uses the available GPU.

Library versions

The NVIDIA device drivers are automatically loaded in all GPU-enabled sizes. However, depending on the software you use, additional components (e.g., CUDA toolkit) might need to be installed via conda.

If you launch an app in a GPU-enabled size on Nuvolos, the nvidia-smi tool will be available from the command line/terminal. You can use this to check the driver version and monitor memory usage of the card.

$ nvidia-smi
Thu Jun  1 08:39:06 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10-4Q       On   | 00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |    333MiB /  4096MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Due to the underlying virtualization technology in Nuvolos, the nvidia-smi tool is currently unable to list processes using the GPU

Note that nvidia-smi reports the CUDA Driver API version in its output (11.6). However, most high-level machine learning frameworks use the CUDA Runtime API as well, which is provided by the CUDA Runtime library. Most frameworks can automatically install the required version of the runtime, so if you're starting from scratch, this should be straightforward to set up.

Please find examples below on how to get started with GPU computations on Nuvolos, or consult the relevant machine learning library documentation directly. If you need additional support, reach out to our team.

GPU Monitoring

For interactive GPU monitoring, install the nvitop package:

Large Language Models

Two practical guidelines for running LLMs on Nuvolos:

Python

Installing the right CUDA version for Python packages can be intricate. Start with a clean Application and install your high-level ML library (PyTorch, TensorFlow) first - these libraries pull in the exact CUDA they need. Install other libraries afterwards.

PyTorch

For PyTorch, pip works better than conda - pip does not try to overwrite system libraries:

The standard PyTorch pip command installs PyTorch with the latest major CUDA Runtime version (12). On Nuvolos, all GPUs currently support version 12 except the A10 card. If you need to run on an A10, install PyTorch with the older CUDA Runtime version 11.

You do not need a GPU available in your application to install PyTorch with GPU support - install on any size, then scale up to a GPU-enabled size. To verify the installation works on a GPU-enabled size:

If it completes without an error, your configuration is correct.

Note that pip will install the runtime libraries needed by PyTorch, but will not set up a complete developer environment that you could use outside Python (see official notes). To use tools like nvcc from the command line, please install the CUDA Toolkit via conda instead.

NVCC (CUDA compiler)

To compile CUDA executables with nvcc, install the compiler binaries and the C runtime headers:

Both packages are available in CUDA 11 and 12 versions. For the full toolkit:

TensorFlow

To install TensorFlow, we recommend using conda as TensorFlow requires the cudatoolkit package.

Substitute CUDA_VERSION with the version reported by nvidia-smi. If you do not need the latest CUDA, start with an older version (such as 11.6) for compatibility with older GPU cards.

Install TensorFlow and cudatoolkit from the same conda channel when possible.

You don't need a GPU available in your running app to install TensorFlow with GPU support. It's sufficient to scale up to a GPU-enabled size after installation is complete. To test if your installation was successful, execute the following code snippet while on a GPU-enabled size:

If you see an output similar to

that ends with GPU:0, your configuration is correct.

RStudio

With Machine Learning (CUDA-enabled) RStudio images, you can run GPU computations on GPU-accelerated nodes. These images have the CUDA runtime/toolkit installed as well.

XGBoost

We recommend using the pre-built experimental binary to get started with XGBoost and R. In a terminal on a GPU node:

Test with the XGBoost GPU acceleration demo.

TensorFlow and Keras

You can use TensorFlow with GPU acceleration by following our TensorFlow installation guide and selecting to install version = "gpu" when installing TensorFlow.

Last updated

Was this helpful?