- This event has passed.
ONLINE COURSE – Data wrangling using R and Rstudio (DWRS02)
21st April 2021 - 22nd April 2021£275.00
Thursday, May 26th, 2021
This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.
TIME ZONE – UTC+2 – however all sessions will be recorded and made available allowing attendees from different time zones to follow a day behind with an additional 1/2 days support after the official course finish date (please email firstname.lastname@example.org for full details or to discuss how we can accommodate you).
In this two day course, we provide a comprehensive practical introduction to data wrangling using R. In particular, we focus on tools provided by R’s tidyverse, including dplyr, tidyr, purrr, etc. Data wrangling is the art of taking raw and messy data and formating and cleaning it so that data analysis and visualization etc may be performed on it. Done poorly, it can be a time consuming, labourious, and error-prone. Fortunately, the tools provided by R’s tidyverse allow us to do data wrangling in a fast, efficient, and high-level manner, which can have dramatic consequence for ease and speed with which we analyse data. On Day 1 of this course, having covered how to read data of different types into R, we cover in detail all the dplyr tools such as select, filter, mutate, etc. Here, we will also cover the pipe operator (%>%) to create data wrangling pipelines that take raw messy data on the one end and return cleaned tidy data on the other. On Day 2, we cover how to perform descriptive or summary statistics on our data using dplyr’s summarize and group_by functions. We then turn to combining and merging data. Here, we will consider how to concatenate data frames, including concatenating all data files in a folder, as well as cover the powerful SQL like join operations that allow us to merge information in different data frames. The final topic we will consider is how to “pivot” data from a “wide” to “long” format and back using tidyr’s pivot_longer and pivot_wider.
THIS IS ONE COURSE IN OUR R SERIES – LOOK OUT FOR COURSES WITH THE SAME COURSE IMAGE TO FIND MORE IN THIS SERIES
This course is aimed at anyone who is interested in using R for data science or statistics. R is widely used in all areas of academic scientific research, and also widely throughout the public, and private sector.
Venue – Delivered remotely
Time zone – GMT+0
Availability – TBC
Duration – 2 days
Contact hours – Approx. 15 hours
ECT’s – Equal to 1 ECT’s
Language – English
This course will be largely practical, hands-on, and workshop based. For each topic, there will first be some lecture style presentation, i.e., using slides or blackboard, to introduce and explain key concepts and theories. Then, we will cover how to perform the various statistical analyses using R. Any code that the instructor produces during these sessions will be uploaded to a publicly available GitHub site after each session. For the breaks between sessions, and between days, optional exercises will be provided. Solutions to these exercises and brief discussions of them will take place after each break.
The course will take place online using Zoom. On each day, the live video broadcasts will occur during UK local time (GMT+0) at:
All sessions will be video recorded and made available to all attendees as soon as possible, hopefully soon after each 2hr session.
If some sessions are not at a convenient time due to different time zones, attendees are encouraged to join as many of the live broadcasts as possible. For example, attendees from North America may be able to join the live sessions from 3pm-5pm and 6pm-8pm, and then catch up with the 12pm-2pm recorded session once it is uploaded. By joining live sessions attendees will be able to benefit from asking questions and having discussions, rather than just watching prerecorded sessions.
At the start of the first day, we will ensure that everyone is comfortable with how Zoom works, and we’ll discuss the procedure for asking questions and raising comments.
Although not strictly required, using a large monitor or preferably even a second monitor will make the learning experience better, as you will be able to see my RStudio and your own RStudio simultaneously.
All the sessions will be video recorded, and made available immediately on a private video hosting website. Any materials, such as slides, data sets, etc., will be shared via GitHub.
Assumed quantitative knowledge
We will assume familiarity with only the most basic of statistical concepts, such as descriptive statistics. We will not even assume that participants will have taken university level courses on statistics.
Assumed computer background
Minimal prior experience with R and RStudio is required. Attendees should be familiar with some basic R syntax and commands, how to write code in the RStudio console and script editor, how to load up data from files, etc.
Equipment and software requirements
Attendees of the course will need to use a computer on which RStudio can be installed. This includes Mac, Windows, and Linux, but not tablets or other mobile devices. Instructions on how to install and configure all the required software, which is all free and open source, will be provided before the start of the course. We will also provide time during the workshops to ensure that all software is installed and configured properly.
UNSURE ABOUT SUITABLILITY THEN PLEASE ASK email@example.com
Assumed quantitative knowledge
Assumed computer background
Equipment and software requirements
Attendees will need to install/update R/RStudio and various additional R packages.
This can be done on Macs, Windows, and Linux.
R – https://cran.r-project.org/
RStudio – https://www.rstudio.com/products/rstudio/download/
PLEASE READ – CANCELLATION POLICY
Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact firstname.lastname@example.org. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited.
Wednesday 21st – Classes from 12:00 to 20:00
Topic 1: Reading in data. We will begin by reading in data into R using tools such as readr and readxl. Almost all types of data can be read into R, and here we will consider many of the main types, such as csv, xlsx, sav, etc. Here, we will also consider how to contol how data are parsed, e.g., so that they are read as dates, numbers, strings, etc.
Topic 2: Wrangling with dplyr. For the remainder of Day 1, we will next cover the very powerful dplyr R package. This package supplies a number of so-called “verbs” — select, rename, slice, filter, mutate, arrange, etc. — each of which focuses on a key data manipulation tools, such as selecting or changing variables. These verbs also have _if, _at, _all variants that we will consider. All of these verbs can be chained together using “pipes” (represented by %>%). Together, these create powerful data wrangling pipelines that take raw data as input and return cleaned data as output. Here, we will also learn about the key concept of “tidy data”, which is roughly where each row of a data frame is an observation and each column is a variable.
Thursday 22nd – Classes from 12:00 to 20:00
Topic 3: Summarizing data. The summarize and group_by tools in dplyr can be used with great effect to summarize data using descriptive statistics.
Topic 4: Merging and joining data frames. There are multiple ways to combine data frames, with the simplest being “bind” operations, which are effectively horizontal or vertical concatenations. Much more powerful are the SQL like “join” operations. Here, we will consider the inner_join, left_join, right_join, full_join operations. In this section, we will also consider how to use purrr to read in and automatically merge large sets of files.
Topic 5: Pivoting data. Sometimes we need to change data frames from “long” to “wide” formats. The R package tidyr provides the tools pivot_longer and pivot_wider for doing this.
- Dr. Mark Andrews
Senior Lecturer, Psychology Department, Nottingham Trent University, England
- Free 1 day intro to r and r studio (FIRR)
- Introduction To Statistics Using R And Rstudio (IRRS03)
- Introduction to generalised linear models using r and rstudio (IGLM)
- Introduction to mixed models using r and rstudio (IMMR)
- Nonlinear regression using generalized additive models (GAMR)
- Introduction to hidden markov and state space models (HMSS)
- Introduction to machine learning and deep learning using r (IMDL)
- Model selection and model simplification (MSMS)
- Data visualization using gg plot 2 (r and rstudio) (DVGG)
- Data wrangling using r and rstudio (DWRS)
- Reproducible data science using rmarkdown, git, r packages, docker, make & drake, and other tools (RDRP)
- Introduction/fundamentals of bayesian data analysis statistics using R (FBDA)
- Bayesian data analysis (BADA)
- Bayesian approaches to regression and mixed effects models using r and brms (BARM)
- Introduction to stan for bayesian data analysis (ISBD)
- Introduction to unix (UNIX01)
- Introduction to python (PYIN03)
- Introduction to scientific, numerical, and data analysis programming in python (PYSC03)
- Machine learning and deep learning using python (PYML03)
- Python for data science, machine learning, and scientific computing (PDMS02)