Database research workflow - Matlab & RStudio

Outcome You query data from the Scientific Data Warehouse, analyse it in MATLAB or RStudio, and write results back as a new table.

Before you start

  • Tables are activated in your space. See Reference › Data storage if not.

  • You have at least one MATLAB or RStudio application configured.

  • You have identified the source tables for your analysis.

Nuvolos supports a query–analyse–insert cycle. The example below uses the Fama-French 5-factor table (NORTH_AMERICA_5_FACTORS) joined with a monthly stock returns table - the same pattern extends to any tables you have access to.

A standard skeleton scientific workflow in MATLAB/RStudio can be broken down into three main steps:

  1. Query research-relevant data

  2. Analyse, transform or otherwise manipulate data

  3. Store results

You can create more complex workflows, but they will usually be built from three-step modules like the one above.

MATLAB workflow

1

Querying relevant data

After opening the MATLAB application, define your SQL query as a string and execute it:

query_string = "SELECT NAF.*, SM.MPRC, SM.MRET*100 AS SM_MRET_100, SM.MTCAP " + ...
               "FROM NORTH_AMERICA_5_FACTORS NAF " + ...
               "INNER JOIN TIME_SERIES_MONTHLY SM " + ...
               "ON SM.MCALDT = NAF.DATE " + ...
               "WHERE KYPERMNO = 14593";
 
con = get_connection();
dataset_factor = select(con, query_string);
2

The simple analysis

Use MATLAB's fitlm to fit a linear regression with an R-style formula:

dataset_factor.EXCESS_RETURN = dataset_factor.SM_MRET_100 - dataset_factor.RF;
 
mod = fitlm(dataset_factor, ...
  'EXCESS_RETURN ~ (-1) + MKT_RF + SMB + HML + RMW + CMA');
 
dataset_factor.FIT_FACTOR_5 = mod.Fitted;
3

Write results back

Use the sqlwrite command to push the results back to the warehouse as a new table:

sqlwrite(con, 'APPLE_5FACTOR_FIT', dataset_factor);

RStudio workflow

1

Querying relevant data

After opening the RStudio application, define your SQL query as a string and execute it:

query_string <- "
SELECT NAF.*, SM.MPRC, SM.MRET*100 AS SM_MRET_100, SM.MTCAP
FROM NORTH_AMERICA_5_FACTORS NAF
INNER JOIN TIME_SERIES_MONTHLY SM
  ON SM.MCALDT = NAF.DATE
WHERE KYPERMNO = 14593
"
 
conn <- nuvolos::get_connection()
dataset_factor <- dbGetQuery(conn, query_string)
2

The simple analysis

Use RStudio's fitlm to fit a linear regression with an R-style formula:

dataset_factor$EXCESS_RETURN <- dataset_factor$SM_MRET_100 - dataset_factor$RF
 
mod <- lm(EXCESS_RETURN ~ (-1) + MKT_RF + SMB + HML + RMW + CMA,
          dataset_factor)
 
dataset_factor$FIT_FACTOR_5 <- mod$fitted.values
3

Write results back

As a final step, we write back the results using the data upload command for R:

DBI::dbWriteTable(conn, name = "APPLE_5FACTOR_FIT",
                  value = dataset_factor, batch_rows = 10000)

For the full data warehouse reference - Python and Stata connectors, off-Nuvolos access, ODBC drivers, big-data exports - see Reference › Data storage.

Last updated

Was this helpful?