# Improve your data analysis workflow with the drake R package

A quick guide.

Miha Gazvoda https://mihagazvoda.com
10-30-2020

drake is an R package by Will Landau that analyzes your workflow. It

• enables to skip steps of the analysis with up-to-date results;
• provides evidence that results match the underlying code and data;
• encourages good programming practices by modularizing your code into functions1;
• interactively visualizes network representation of your workflow.

# Setup

Install drake. You can also load an example written by Kirill Müller. It will appear in a new main folder. I will use it as a showcase for some file examples.

# Install and load drake
install.packages("drake")

# Get an example in a new main folder
drake::drake_example("main")

# You can use drake::examples() to see all examples


## Project structure

It’s suggested that you start your project using this structure2:

make.R
R/
├── packages.R
├── functions.R
└── plan.R
data/

You can also use dflow::use_dflow() to create almost similar structure.

### Make

make.R is a master script that

• creates a drake plan;
• calls make().
# make.R
source("R/plan.R")      # creates drake plan

make(plan)              # defined in R/plan.R


### Plan

drake plan is the high-level catalog of data analysis steps (such as data cleaning, model fitting, visualization, and reporting) in a workflow.
Plan is presented as a data frame with columns named target and command.

# plan.R
# The workflow plan data frame outlines what you are going to do.
plan <- drake::drake_plan(
# target, command
data = raw_data %>%
mutate(Species = forcats::fct_inorder(Species)),
hist = create_plot(data),
fit = lm(Sepal.Width ~ Petal.Width + Species, data),
report = rmarkdown::render(
knitr_in("report.Rmd"),
output_file = file_out("report.html"),
quiet = TRUE
)
)


Drake plan is presented as a data frame with columns named target and command. Each row represents a step in the workflow. Each command is a concise expression that makes use of our functions, and each target is the return value of the command.

See plan object.
target command
data raw_data %>% mutate(Species = forcats::fct_inorder(Species))
hist create_plot(data)
fit lm(Sepal.Width ~ Petal.Width + Species, data)
report rmarkdown::render(knitr_in(“report.Rmd”), output_file = file_out(“report.html”), , quiet = TRUE)
See dependency graph.

#### Choose good targets

As Will Landau proposed, a good target is

• long enough to eat up a decent chunk of runtime;
• small enough that make() frequently skips it;
• an R object compatible with saveRDS().

# Workflow

Even if you use drake, it makes sense to develop interactively. With r_make("make.R")4 you build your project. With loadd and readd you return targets to your session and interactively use them to develop things further.

## Basic commands

Here are the most useful commands.

function description
clean() Force targets to be out of date and remove target names from the data in the cache.
vis_drake_graph() Show an interactive visual network representation of your workflow.
code_to_function() Create functions from scripts so you can pass them as commands in drake plan.
You can find more functions in drake README.
$$\bullet\bullet\bullet$$
3. If you name make file _drake.R instead, you are able to call r_make() without an argument.↩︎