Databricks Connect

Nuvolos now offers a VSCode application with Python 3.9 and R 4.2 and Databricks Connect (databrics-connect) pre-installed. From this application, you can submit Spark jobs to Databrics-hosted Spark clusters.

PySpark and sparklyR are both installed in the application.

Prerequisites

Databricks Connect supports Databricks clusters up to 10.4 LTS.

To configure the connection, you need:

  • the URL of your Databricks cluster, and

  • a personal access token. Personal access tokens are not available in Databricks Community Edition.

To connect from Nuvolos:

  1. Create a Databricks 10.4 LTS + Py39 + R 4.2 application.

  2. Start the application.

  3. Open a terminal and run:

    databricks-connect configure
  4. Enter the Databricks cluster URL and your personal access token when prompted.

  5. Verify the setup with:

    databricks-connect test

Python example

To run the example, please install the slugify Python package with the following command:

conda install -y -c conda-forge python-slugify

Once you have configured the Databricks connection, you can try the following simple example to create a Databricks table and run a SQL query on the table:

R example

The sparklyr package is pre-installed in the application which allows you to connect to Databricks Spark clusters, configured with databricks-connect.

You can run the following R script example to run a simple job on your Databricks cluster:

Was this helpful?