# A complete database research workflow (Matlab & RStudio)

Nuvolos supports a query-analyze-insert cycle to help researchers query, analyse and store data safely in a simple, integrated workflow. The workflow is demonstrated in two languages, but can be extended to any languages supported by Nuvolos.

## Matlab workflow

A standard skeleton scientific workflow in Matlab can be broken down into three main steps:

1. Query research-relevant data
2. Analyse, transform or otherwise manipulate data
3. Store results

### Querying relevant data

For the mock example, we are going to work with the Fama-French factor set that is available for our Demo user, we will be focusing on the [North America](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_5developed.html) 5-factor table. The `NORTH_AMERICA_5_FACTORS` table has been [distributed to the instance we are working in](/features/database-integration.md#distribute-data-you-need).

After opening the Matlab application, the following bit of code will return the entire database table as a Matlab table-type object. The query we are executing is a merge of a monthly stock series table (for Apple monthly stock prices) and the Fama-French factor table as follows.

The SQL query:

```
SELECT NAF.*, SM.MPRC, SM.MRET*100, SM.MTCAP 
FROM NORTH_AMERICA_5_FACTORS NAF 
INNER JOIN TIME_SERIES_MONTHLY SM 
ON SM.MCALDT = NAF.DATE 
WHERE KYPERMNO = 14593
```

The code that executes the query, the above string is saved in `query_string`.

```
con = get_connection();
dataset_factor = select(con, query_string);
```

### The simple analysis

The previous step resulted in `dataset_factor` containing a Matlab `Table` object that holds the data. The `fitlm` method fits a linear regression on the table with an R-style formula. We then write back the fitted values to the Table as a column.

```
dataset_factor.EXCESS_RETURN = dataset_factor.SM_MRET_100 - dataset_factor.RF;
mod = fitlm(dataset_factor,'EXCESS_RETURN ~ (-1) + MKT_RF + SMB + HML + RMW + CMA')
dataset_factor.FIT_FACTOR_5 = mod.Fitted;
```

### Storing results in the database

As a final step, we write back the results using the data upload command for Matlab:

```
sqlwrite(con,'APPLE_5FACTOR_FIT',dataset_factor);
```

## RStudio workflow

A standard skeleton scientific workflow in RStudio can be broken down into three main steps:

1. Query research-relevant data
2. Analyse, transform or otherwise manipulate data
3. Store results

You can create more complex workflows, but they will usually be built from three-step modules like the one above

### Querying relevant data

For the mock example, we are going to work with the Fama-French factor set that is available for our Demo user, we will be focusing on the [North America](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_5developed.html) 5-factor table. The `NORTH_AMERICA_5_FACTORS` [table has been distributed to the instance we are working in](/features/database-integration.md#distribute-data-you-need).

After opening the RStudio application, the following bit of code will return the entire database table as an R data frame object. The query we are executing is a merge of a monthly stock series table (for Apple monthly stock prices) and the Fama-French factor table as follows.

The SQL query:

```
SELECT NAF.*, SM.MPRC, SM.MRET*100 AS SM_MRET_100, SM.MTCAP 
FROM NORTH_AMERICA_5_FACTORS NAF 
INNER JOIN TIME_SERIES_MONTHLY SM 
ON SM.MCALDT = NAF.DATE 
WHERE KYPERMNO = 14593
```

The code that executes the query, the above string is saved in `query_string`.

```
conn <- nuvolos::get_connection()
dataset_factor <- dbGetQuery(conn, query_string)
```

### The simple analysis

The previous step resulted in `dataset_factor` containing an R `data.frame` object that holds the data. The `lm` method fits a linear regression on the data frame. We put the fitted values to the data frame.

```
dataset_factor$EXCESS_RETURN <- dataset_factor$SM_MRET_100 - dataset_factor$RF
mod <- lm(EXCESS_RETURN ~ (-1) + MKT_RF + SMB + HML + RMW + CMA, dataset_factor)
dataset_factor$FIT_FACTOR_5 <- mod$fitted.values
```

### Storing results in the database

As a final step, we write back the results using the data upload command for R:

```
DBI::dbWriteTable(conn, name="APPLE_5FACTOR_FIT", value=dataset_factor, batch_rows = 10000)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.nuvolos.com/user-guides/data-guides/a-complete-database-research-workflow-matlab-and-rstudio.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
