ONLINE COURSE – Reproducible Data Science using RMarkdown, Git, R packages, Docker, Make & Drake, and other tools (RDRP01) This course will be delivered live
29 June 2020 - 3 July 2020
This course will now be delivered live by video link in light of travel restrictions due to the COVID-19 (Coronavirus) outbreak.
This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.
TIME ZONE – Western European Time – however all sessions will be recorded and made available allowing attendees from different time zones to follow a day behind with an additional 1/2 days support after the official course finish date (please email email@example.com for full details or to discuss how we can accommodate you).
This course provides a comprehensive introduction to doing reproducible data analysis, which we define as analysis where the entire workflow or pipeline is as open and transparent as possible, making it possible for others, including our future selves, to be able to exactly reproduce any of its results. We cover this topic by providing a thorough introduction to a set of R based and general computing tools such as RMarkdown, Git & GitHub, R packages, Docker, Gnu Make and Drake, and show how they can be used together to do reproducible data analysis that can then be shared with others. After a general introduction on Day 1, where we introduce the core concept of a research compendium, we will begin by covering RMarkdown, knitr and related tools. These are vital tools for reproducible research that allow us to produce data analysis reports, i.e. articles, slides, posters, websites, etc., by embedding analysis code (R, Python, etc) within the text of the report that is then executed, and the results it produces are inserted into the final output document. On Day 2, we provide a comprehensive introduction to version control using Git, including using GitHub. Git and GitHub are vital tools for the organization, maintenance, and distribution of our code, especially for large scale and long term projects involving multiple collaborators. On Day 3, we cover how to create, maintain, distribute R packages. R packages are obviously the principal means of distributing reusable R code generally, and here, we will also look at how R packages can be used also to create, maintain, and distribute research compendia. On Day 4, we cover Docker, which is a now very popular means for producing reproducible computing environments across different devices, platforms, and operating systems. On Day 5, we cover build automation tools, particularly Gnu Make and Drake, which are used for automatically running complex analysis code that involves multiple inter-dependencies between files. Gnu Make is a general purpose build automation tool, while Drake is specifically designed for complex data analysis pipelines in R. On each day, therefore, we aim to provide a comprehensive and thorough introduction to a set of valuable and generally useful computing tools, each of which plays a key role in allowing us to do reproducible data science.
To find out more or to book online via our sister company (PS statistics) use the link below…
The instructors were excellent and clearly were the reasons for my previous comments. They both combined a deep understanding of statistics and ecology at the same level.Any questions or queries I’ve had, were thus first answered with an ecological point of view and then translated into statistical consideration thereby making much more sense on both side.In addition the course was very well organised, the course director and the two instructors were very friendly as well as professional. On the top of learning many useful things, I’ve also had a very good time during the week there.” Clement Garcia,
Spatial ecologist, Centre For Environment, Fisheries & Aquaculture Science (CEFAS), England
(Attended ADVR course)