Reproducible Data Science Using RMarkdown, Git, R Packages, Docker, Make & Drake And Other Tools (RDRP01R)
5th May 2025£400.00
About This Course
This course provides a comprehensive introduction to doing reproducible data analysis, which we define as analysis where the entire workflow or pipeline is as open and transparent as possible, making it possible for others, including our future selves, to be able to exactly reproduce any of its results. We cover this topic by providing a thorough introduction to a set of R based and general computing tools such as RMarkdown, Git & GitHub, R packages, Docker, Gnu Make and Drake, and show how they can be used together to do reproducible data analysis that can then be shared with others. After a general introduction on Day 1, where we introduce the core concept of a research compendium, we will begin by covering RMarkdown, knitr and related tools. These are vital tools for reproducible research that allow us to produce data analysis reports, i.e. articles, slides, posters, websites, etc., by embedding analysis code (R, Python, etc) within the text of the report that is then executed, and the results it produces are inserted into the final output document. On Day 2, we provide a comprehensive introduction to version control using Git, including using GitHub. Git and GitHub are vital tools for the organization, maintenance, and distribution of our code, especially for large scale and long term projects involving multiple collaborators. On Day 3, we cover how to create, maintain, distribute R packages. R packages are obviously the principal means of distributing reusable R code generally, and here, we will also look at how R packages can be used also to create, maintain, and distribute research compendia. On Day 4, we cover Docker, which is a now very popular means for producing reproducible computing environments across different devices, platforms, and operating systems. On Day 5, we cover build automation tools, particularly Gnu Make and Drake, which are used for automatically running complex analysis code that involves multiple inter-dependencies between files. Gnu Make is a general purpose build automation tool, while Drake is specifically designed for complex data analysis pipelines in R. On each day, therefore, we aim to provide a comprehensive and thorough introduction to a set of valuable and generally useful computing tools, each of which plays a key role in allowing us to do reproducible data science.
Last Up-Dated –
Duration – Approx. 35 hours
ECT’s – Equal to 3 ECT’s
Language – English
This course will be hands-on and workshop based. Throughout each day, there will be some lecture style presentation, i.e., using slides, introducing and explaining key concepts. However, even in these cases, the topics being covered will include practical worked examples that will work through together.
Assumed quantitative knowledge
Assumed computer background
Equipment and software requirements
A laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers. R may be downloaded by following the links here https://www.r-project.org/. RStudio may be downloaded by following the links here: https://www.rstudio.com/.
All the R packages that we will use in this course will be possible to download and install during the workshop itself as and when they are needed, and a full list of required packages will be made available to all attendees prior to the course.
A working webcam is desirable for enhanced interactivity during the live sessions, we encourage attendees to keep their cameras on during live zoom sessions.
Although not strictly required, using a large monitor or preferably even a second monitor will improve he learning experience
PLEASE READ – CANCELLATION POLICY
Cancellations/refunds are accepted as long as the course materials have not been accessed,.
There is a 20% cancellation fee to cover administration and possible bank fess.
If you need to discuss cancelling please contact firstname.lastname@example.org.
If you are unsure about course suitability, please get in touch by email to find out more email@example.com
- Dr. Mark Andrews
Senior Lecturer, Psychology Department, Nottingham Trent University, England
- Free 1 day intro to r and r studio (FIRR)
- Introduction To Statistics Using R And Rstudio (IRRS03)
- Introduction to generalised linear models using r and rstudio (IGLM)
- Introduction to mixed models using r and rstudio (IMMR)
- Nonlinear regression using generalized additive models (GAMR)
- Introduction to hidden markov and state space models (HMSS)
- Introduction to machine learning and deep learning using r (IMDL)
- Model selection and model simplification (MSMS)
- Data visualization using gg plot 2 (r and rstudio) (DVGG)
- Data wrangling using r and rstudio (DWRS)
- Reproducible data science using rmarkdown, git, r packages, docker, make & drake, and other tools (RDRP)
- Introduction/fundamentals of bayesian data analysis statistics using R (FBDA)
- Bayesian data analysis (BADA)
- Bayesian approaches to regression and mixed effects models using r and brms (BARM)
- Introduction to stan for bayesian data analysis (ISBD)
- Introduction to unix (UNIX01)
- Introduction to python (PYIN03)
- Introduction to scientific, numerical, and data analysis programming in python (PYSC03)
- Machine learning and deep learning using python (PYML03)
- Python for data science, machine learning, and scientific computing (PDMS02)