> For the complete documentation index, see [llms.txt](https://docs.nuvolos.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.nuvolos.com/how-to-guides/application-specific-guides/databricks-connect.md).

# Databricks Connect

Nuvolos now offers a VSCode application with Python 3.9 and R 4.2 and Databricks Connect (`databrics-connect`) pre-installed. From this application, you can submit Spark jobs to Databrics-hosted Spark clusters.

[PySpark](https://spark.apache.org/docs/latest/api/python/) and [sparklyR](https://spark.rstudio.com/) are both installed in the application.

### Prerequisites

{% hint style="info" %}
[Databricks Connect](https://docs.databricks.com/dev-tools/databricks-connect.html) supports Databricks clusters up to 10.4 LTS.
{% endhint %}

To configure the connection, you need:

* the URL of your Databricks cluster, and
* a [personal access token](https://docs.databricks.com/dev-tools/api/latest/authentication.html#token-management).\
  Personal access tokens are not available in Databricks Community Edition.

To connect from Nuvolos:

1. Create a **Databricks 10.4 LTS + Py39 + R 4.2** application.
2. Start the application.
3. Open a terminal and run:

   ```
   databricks-connect configure
   ```
4. Enter the [Databricks cluster URL](https://docs.databricks.com/dev-tools/databricks-connect.html#step-2-configure-connection-properties) and your [personal access token](https://docs.databricks.com/dev-tools/api/latest/authentication.html#token-management) when prompted.
5. Verify the setup with:

   ```
   databricks-connect test
   ```

### Python example

To run the example, please install the `slugify` Python package with the following command:

`conda install -y -c conda-forge python-slugify`

Once you have configured the Databricks connection, you can try the following simple example to create a Databricks table and run a SQL query on the table:

```python
import pandas as pd
from pyspark.sql import SparkSession
from slugify import slugify

spark = SparkSession.builder.getOrCreate()
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled","true")

# Use NYC Squirrel Census data
df = pd.read_csv("https://data.wa.gov/api/views/f6w7-q2d2/rows.csv?accessType=DOWNLOAD")
df.columns = [slugify(c, separator="_") for c in df.columns]
df = df.drop(columns=["vehicle_location", "electric_utility"])

df = spark.createDataFrame(df)
df.write.mode('overwrite').saveAsTable('ev_data')

spark.sql('select make, model, count(*) as registered from ev_data group by make, model order by registered desc').show(10)
```

### R example

The `sparklyr` package is pre-installed in the application which allows you to connect to Databricks Spark clusters, configured with `databricks-connect.`

You can run the following R script example to run a simple job on your Databricks cluster:

```r
library(sparklyr)
library(dplyr)

databricks_connect_spark_home <- system("databricks-connect get-spark-home", intern = TRUE)
sc <- spark_connect(method = "databricks", spark_home = databricks_connect_spark_home)

cars_tbl <- copy_to(sc, mtcars, overwrite = TRUE)

cars_tbl %>% 
  group_by(cyl) %>% 
  summarise(
    mean_mpg = mean(mpg, na.rm = TRUE),
    mean_hp  = mean(hp, na.rm = TRUE)
    )

print(as_tibble(cars_tbl), n = 10)

spark_disconnect(sc)
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.nuvolos.com/how-to-guides/application-specific-guides/databricks-connect.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
