Data Wrangling Using R And Rstudio (DWRSPR)

Name: Data Wrangling Using R And Rstudio (DWRSPR)
Start: 2030-01-01T00:00:00+00:00
End: 2030-01-01T23:59:59+00:00
Location: Recorded

1st January 2030

£250.00

Course Format

Pre Reocrded

About This Course

In this two day course, we provide a comprehensive practical introduction to data wrangling using R. In particular, we focus on tools provided by R’s tidyverse, including dplyr, tidyr, purrr, etc. Data wrangling is the art of taking raw and messy data and formating and cleaning it so that data analysis and visualization etc may be performed on it. Done poorly, it can be a time consuming, labourious, and error-prone. Fortunately, the tools provided by R’s tidyverse allow us to do data wrangling in a fast, efficient, and high-level manner, which can have dramatic consequence for ease and speed with which we analyse data. On Day 1 of this course, having covered how to read data of different types into R, we cover in detail all the dplyr tools such as select, filter, mutate, etc. Here, we will also cover the pipe operator (%>%) to create data wrangling pipelines that take raw messy data on the one end and return cleaned tidy data on the other. On Day 2, we cover how to perform descriptive or summary statistics on our data using dplyr’s summarize and group_by functions. We then turn to combining and merging data. Here, we will consider how to concatenate data frames, including concatenating all data files in a folder, as well as cover the powerful SQL like join operations that allow us to merge information in different data frames. The final topic we will consider is how to “pivot” data from a “wide” to “long” format and back using tidyr’s pivot_longer and pivot_wider.

Intended Audiences

This course is aimed at anyone who is interested in using R for data science or statistics. R is widely used in all areas of academic scientific research, and also widely throughout the public, and private sector.

Course Details

Last Up-Dated – 22:04:2021

Duration – Approx. 15 hours

ECT’s – Equal to 1 ECT’s

Language – English

Teaching Format

This course will be largely practical, hands-on, and workshop based. For each topic, there will first be some lecture style presentation, i.e., using slides or blackboard, to introduce and explain key concepts and theories. Then, we will cover how to perform the various statistical analyses using R. Any code that the instructor produces during these sessions will be uploaded to a publicly available GitHub site after each session. For the breaks between sessions, and between days, optional exercises will be provided. Solutions to these exercises and brief discussions of them will take place after each break.

Although not strictly required, using a large monitor or preferably even a second monitor will make the learning experience better, as you will be able to see my RStudio and your own RStudio simultaneously.

All the sessions will be video recorded, and made available immediately on a private video hosting website. Any materials, such as slides, data sets, etc., will be shared via GitHub.

Assumed quantitative knowledge

We will assume familiarity with only the most basic of statistical concepts, such as descriptive statistics. We will not even assume that participants will have taken university level courses on statistics.

Assumed computer background

Minimal prior experience with R and RStudio is required. Attendees should be familiar with some basic R syntax and commands, how to write code in the RStudio console and script editor, how to load up data from files, etc.

Equipment and software requirements

A laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers. R may be downloaded by following the links here https://www.r-project.org/. RStudio may be downloaded by following the links here: https://www.rstudio.com/.

All the R packages that we will use in this course will be possible to download and install during the workshop itself as and when they are needed, and a full list of required packages will be made available to all attendees prior to the course.

A working webcam is desirable for enhanced interactivity during the live sessions, we encourage attendees to keep their cameras on during live zoom sessions.

Although not strictly required, using a large monitor or preferably even a second monitor will improve he learning experience

Download R

Download RStudio

Download Zoom

PLEASE READ – CANCELLATION POLICY

Cancellations/refunds are accepted as long as the course materials have not been accessed,.

There is a 20% cancellation fee to cover administration and possible bank fess.

If you need to discuss cancelling please contact oliverhooker@prstatistics.com.

If you are unsure about course suitability, please get in touch by email to find out more oliverhooker@prstatistics.com

COURSE PROGRAMME

Day 1

Approx. 6 Hours

Topic 1: Reading in data. We will begin by reading in data into R using tools such as readr and readxl. Almost all types of data can be read into R, and here we will consider many of the main types, such as csv, xlsx, sav, etc. Here, we will also consider how to contol how data are parsed, e.g., so that they are read as dates, numbers, strings, etc.

Topic 2: Wrangling with dplyr. For the remainder of Day 1, we will next cover the very powerful dplyr R package. This package supplies a number of so-called “verbs” — select, rename, slice, filter, mutate, arrange, etc. — each of which focuses on a key data manipulation tools, such as selecting or changing variables. All of these verbs can be chained together using “pipes” (represented by %>%). Together, these create powerful data wrangling pipelines that take raw data as input and return cleaned data as output. Here, we will also learn about the key concept of “tidy data”, which is roughly where each row of a data frame is an observation and each column is a variable.

Day 2

Approx. 6 Hours

Topic 3: Summarizing data. The summarize and group_by tools in dplyr can be used with great effect to summarize data using descriptive statistics.

Topic 4: Merging and joining data frames. There are multiple ways to combine data frames, with the simplest being “bind” operations, which are effectively horizontal or vertical concatenations. Much more powerful are the SQL like “join” operations. Here, we will consider the inner_join, left_join, right_join, full_join operations. In this section, we will also consider how to use purrr to read in and automatically merge large sets of files.

Topic 5: Pivoting data. Sometimes we need to change data frames from “long” to “wide” formats. The R package tidyr provides the tools pivot_longer and pivot_wider for doing this.

Course Instructor

- Dr. Mark Andrews
Works At
Senior Lecturer, Psychology Department, Nottingham Trent University, England
- Teaches
- Free 1 day intro to r and r studio (FIRR)
- Introduction To Statistics Using R And Rstudio (IRRS03)
- Introduction to generalised linear models using r and rstudio (IGLM)
- Introduction to mixed models using r and rstudio (IMMR)
- Nonlinear regression using generalized additive models (GAMR)
- Introduction to hidden markov and state space models (HMSS)
- Introduction to machine learning and deep learning using r (IMDL)
- Model selection and model simplification (MSMS)
- Data visualization using gg plot 2 (r and rstudio) (DVGG)
- Data wrangling using r and rstudio (DWRS)
- Reproducible data science using rmarkdown, git, r packages, docker, make & drake, and other tools (RDRP)
- Introduction/fundamentals of bayesian data analysis statistics using R (FBDA)
- Bayesian data analysis (BADA)
- Bayesian approaches to regression and mixed effects models using r and brms (BARM)
- Introduction to stan for bayesian data analysis (ISBD)
- Introduction to unix (UNIX01)
- Introduction to python (PYIN03)
- Introduction to scientific, numerical, and data analysis programming in python (PYSC03)
- Machine learning and deep learning using python (PYML03)
- Python for data science, machine learning, and scientific computing (PDMS02)
Personal website

Let’s connect

Lorem ipsum dolor sit amet, consectetuer adipiscing elit.

Details

Date:: 1st January 2030
Cost:: £250.00
Event Category:: Previously Recorded Courses
Event Tags:: ONLINE COURSE

Organiser

: Oliver Hooker (Course Organiser)

Venue

: Recorded
: United Kingdom + Google Map

Data Wrangling Using R And Rstudio (DWRSPR)

1st January 2030

Course Format

About This Course

Intended Audiences

Course Details

Teaching Format

Assumed quantitative knowledge

Assumed computer background

Equipment and software requirements

Tickets

COURSE PROGRAMME

Course Instructor

Let’s connect

General Info

Twitter

Facebook

Details

Organiser

Venue

Tickets

Let’s connect

General Info

Twitter

Facebook