Loading Events

« All Events

  • This event has passed.

ONLINE COURSE – Reproducible Data Science using RMarkdown, Git, R packages, Docker, Make & Drake, and other tools (RDRP01) This course will be delivered live

29th June 2020 - 3rd July 2020

This course will now be delivered live by video link in light of travel restrictions due to the COVID-19 (Coronavirus) outbreak.

This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.

TIME ZONE – Western European Time – however all sessions will be recorded and made available allowing attendees from different time zones to follow a day behind with an additional 1/2 days support after the official course finish date (please email oliverhooker@psstatistics.com for full details or to discuss how we can accommodate you).

Course Overview:

This course provides a comprehensive introduction to doing reproducible data analysis, which we define as analysis where the entire workflow or pipeline is as open and transparent as possible, making it possible for others, including our future selves, to be able to exactly reproduce any of its results. We cover this topic by providing a thorough introduction to a set of R based and general computing tools such as RMarkdown, Git & GitHub, R packages, Docker, Gnu Make and Drake, and show how they can be used together to do reproducible data analysis that can then be shared with others. After a general introduction on Day 1, where we introduce the core concept of a research compendium, we will begin by covering RMarkdown, knitr and related tools. These are vital tools for reproducible research that allow us to produce data analysis reports, i.e. articles, slides, posters, websites, etc., by embedding analysis code (R, Python, etc) within the text of the report that is then executed, and the results it produces are inserted into the final output document. On Day 2, we provide a comprehensive introduction to version control using Git, including using GitHub. Git and GitHub are vital tools for the organization, maintenance, and distribution of our code, especially for large scale and long term projects involving multiple collaborators. On Day 3, we cover how to create, maintain, distribute R packages. R packages are obviously the principal means of distributing reusable R code generally, and here, we will also look at how R packages can be used also to create, maintain, and distribute research compendia. On Day 4, we cover Docker, which is a now very popular means for producing reproducible computing environments across different devices, platforms, and operating systems. On Day 5, we cover build automation tools, particularly Gnu Make and Drake, which are used for automatically running complex analysis code that involves multiple inter-dependencies between files. Gnu Make is a general purpose build automation tool, while Drake is specifically designed for complex data analysis pipelines in R. On each day, therefore, we aim to provide a comprehensive and thorough introduction to a set of valuable and generally useful computing tools, each of which plays a key role in allowing us to do reproducible data science.

To find out more or to book online via our sister company (PS statistics) use the link below…

ONLINE COURSE – Reproducible Data Science using RMarkdown, Git, R packages, Docker, Make & Drake, and other tools (RDRP01) This course will be delivered live



29th June 2020
3rd July 2020


Delivered remotely (United Kingdom)
Western European Time, United Kingdom + Google Map