rOpenSci | Reproducible workflows at scale with drake

Community Call: Reproducible workflows at scale with drake

September 24, 2019

Ambitious workflows in R, such as machine learning analyses, can be difficult to manage. A single round of computation can take several hours to complete, and routine updates to the code and data tend to invalidate hard-earned results. You can enhance the maintainability, hygiene, speed, scale, and reproducibility of such projects with the drake R package. drake resolves the dependency structure of your analysis pipeline, skips tasks that are already up to date, executes the rest with optional distributed computing, and organizes the output so you rarely have to think about data files. This talk demonstrates how to create and maintain a realistic machine learning project using drake-powered automation.

Resources

Announcement blog post
Collaborative notes
The drake R package
The drake R Package User Manual
Self-guided workshop to learn drake
drakeplanner Shiny app
Amanda Dobbyn's talk at NYR 2019, "simple" use case, clear explanation
Garrick Aden-Buie's Reproducible Data Workflows With Drake
Kirill Muller's cheatsheet
Matt Dray's tutorial "Can {drake} RAP?"
Functional programming in R, from Advanced R, by Hadley Wickham
Write your own R functions, from STAT 545, by Jenny Bryan and course TAs

Video recording

community call drake events high performance computing pipeline R reproducibility reproducible-research workflow

Community Call: Reproducible workflows at scale with drake

Resources

Video recording

Info

Work

Participate