- This event has passed.

# ONLINE COURSE – Introduction to Data Wrangling and Data Visualization using R (DWDV01)

## 4th October 2021 - 8th October 2021

Event Date

## Monday, January 10th, 2022

## Course Format

This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.

## Course Program

TIME ZONE – GMT+1– however all sessions will be recorded and made available allowing attendees from different time zones to follow.

Please email oliverhooker@prstatistics.com for full details or to discuss how we can accommodate you.

##### Course Details

Bayesian methods are now increasingly widely in data analysis across most scientific research fields. Given that Bayesian methods differ conceptually and theoretically from their classical statistical counterparts that are traditionally taught in statistics courses, many researchers do not have opportunities to learn the fundamentals of Bayesian methods, which makes using Bayesian data analysis in practice more challenging. The aim of this course is to provide a solid introduction to Bayesian methods, both theoretically and practically. We will begin by teaching the fundamental concepts of Bayesian inference and Bayesian modelling, including how Bayesian methods differ from their classical statistics counterparts, and show how to do Bayesian data analysis in practice in R. We then provide a solid introduction to Bayesian approaches to these topics using R and the brms package. We begin by covering Bayesian approaches to linear regression. We will then proceed to Bayesian approaches to generalized linear models, including binary logistic regression, ordinal logistic regression, Poisson regression, zero-inflated models, etc. Finally, we will cover Bayesian approaches to multilevel and mixed effects models. Throughout this course, we will be using, via the brms package, Stan based Markov Chain Monte Carlo (MCMC) methods.

##### Intended Audiences

Coming soon…

##### Venue

Delivered remotely

##### Course Information

Availability – 30 places

Duration – 5 days

Contact hours – Approx. 35 hours

ECT’s – Equal to 3 ECT’s

Language – English

##### Teaching Format

There will be morning lectures based on the modules outlined in the course timetable. In the afternoon there will be practicals based on the topics covered that morning. Data sets for computer practicals will be provided by the instructors, but participants are welcome to bring their own data.

##### Assumed quantitative knowledge

A basic understanding of statistical concepts. Specifically, generalised linear regression models, statistical significance, hypothesis testing.

##### Assumed computer background

Familiarity with R. Ability to import/export data, manipulate data frames, fit basic statistical models & generate simple exploratory and diagnostic plots.

##### Equipment and software requirements

Attendees will need to install/update R/RStudio and various additional R packages.

This can be done on Macs, Windows, and Linux.

R – https://cran.r-project.org/

RStudio – https://www.rstudio.com/products/rstudio/download/

**PLEASE READ – CANCELLATION POLICY**

Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@prstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited.

**If you are unsure about course suitability, please get in touch by email to find out more**

**COURSE PROGRAMME**

**Monday 10th –** Classes from 12:00-17:00

• Topic 1: We will begin with a overview of what Bayesian data analysis is in essence and how it fits into statistics as it practiced generally. Our main point here will be that Bayesian data analysis is effectively an alternative school of statistics to the traditional approach, which is referred to variously as the classical, or sampling theory based, or frequentist based approach, rather than being a specialized or advanced statistics topic. However, there is no real necessity to see these two general approaches as being mutually exclusive and in direct competition, and a pragmatic blend of both approaches is entirely possible.

• Topic 2: Introducing Bayes’ rule. Bayes’ rule can be described as a means to calculate the probability of causes from some known effects. As such, it can be used as a means for performing statistical inference. In this section of the course, we will work through some simple and intuitive calculations using Bayes’ rule. Ultimately, all of Bayesian data analysis is based on an application of these methods to more complex statistical models, and so understanding these simple cases of the application of Bayes’ rule can help provide a foundation for the more complex cases.

• Topic 3: Bayesian inference in a simple statistical model. In this section, we will work through a classic statistical inference problem, namely inferring the number of red marbles in an urn of red and black marbles, or equivalent problems. This problem is easy to analyse completely with just the use of R, but yet allows us to delve into all the key concepts of all Bayesian statistics including the likelihood function, prior distributions, posterior distributions, maximum a posteriori estimation, high posterior density intervals, posterior predictive intervals, marginal likelihoods, Bayes factors, model evaluation of out-of-sample generalization.

**Tuesday 11th –** Classes from 12:00-17:00

• Topic 4: Bayesian analysis of normal models. Statistical models based on linear and normal distribution are a mainstay of statistical analyses in general. They encompass models such as linear regression, Pearson’s correlation, t-tests, ANOVA, ANCOVA, and so on. In this section, we will describe how to do Bayesian analysis of normal linear models, focusing on simple examples. One of the aims of this section is to identify some important and interesting parallels between Bayesian and classical or frequentist analyses. This shows how Bayesian and classical analyses can be seen as ultimately providing two different perspectives on the same problem.

• Topic 5: The previous section provides a so-called analytical approach to linear and normal models. This is where we can calculate desired quantities and distributions by way of simple formulae. However, analytical approaches to Bayesian analyses are only possible in a relatively restricted set of cases. On the other hand, numerical methods, specifically Markov Chain Monte Carlo (MCMC) methods can be applied to virtually any Bayesian model. In this section, we will re-perform the analysis presented in the previous section but using MCMC methods. For this, we will use the brms package in R that provides an exceptionally easy to use interface to Stan.

**Wednesday 12th – Classes from 12:00-17:00**

• Topic 6: Bayesian linear models. We begin by covering Bayesian linear regression. For this, we will use the brm command from the brms package, and we will compare and contrast the results with the standard lm command. By comparing and contrasting brm with lm we will see all the major similarities and differences between the Bayesian and classical approach to linear regression. We will, for example, see how Bayesian inference and model comparison works in practice and how it differs conceptually and practically from inference and model comparison in classical regression. As part of this coverage of linear models, we will also use categorical predictor variables and explore varying intercept and varying slope linear models.

**Thursday 13th –** Classes from 12:00-17:00

• Topic 7: Extending Bayesian linear models. Classical normal linear models are based on strong assumptions that do not always hold in practice. For example, they assume a normal distribution of the residuals, and assume homogeneity of variance of this distribution across all values of the predictors. In Bayesian models, these assumptions are easily relaxed. For example, we will see how we can easily replace the normal distribution of the residuals with a t-distribution, which will allow for a regression model that is robust to outliers. Likewise, we can model the variance of the residuals as being dependent on values of predictor variables.

• Topic 8: Bayesian generalized linear models. Generalized linear models include models such as logistic regression, including multinomial and ordinal logistic regression, Poisson regression, negative binomial regression, zero-inflated models, and other models. Again, for these analyses we will use the brms package and explore this wide range of models using real world data-sets. In our coverage of this topic, we will see how powerful Bayesian methods are, allowing us to easily extend our models in different ways in order to handle a variety of problems and to use assumptions that are most appropriate for the data being modelled.

**Friday 14th –** Classes from 12:00-17:00

• Topic 9: Multilevel and mixed models. In this section, we will cover the multilevel and mixed effects variants of the regression models, i.e. linear, logistic, Poisson etc, that we have covered so far. In general, multilevel and mixed effects models arise whenever data are correlated due to membership of a group (or group of groups, and so on). For this, we use a wide range of real-world data-sets and problems, and move between linear, logistic, etc., models are we explore these analyses. We will pay particular attention to considering when and how to use varying slope and varying intercept models, and how to choose between maximal and minimal models. We will also see how Bayesian approaches to multilevel and mixed effects models can overcome some of the technical problems (e.g. lack of model convergence) that beset classical approaches.

##### Monday 10th

**Classes from 12:00 to 17:00**

• Topic 1: We will begin with a overview of what Bayesian data analysis is in essence and how it fits into statistics as it practiced generally. Our main point here will be that Bayesian data analysis is effectively an alternative school of statistics to the traditional approach, which is referred to variously as the classical, or sampling theory based, or frequentist based approach, rather than being a specialized or advanced statistics topic. However, there is no real necessity to see these two general approaches as being mutually exclusive and in direct competition, and a pragmatic blend of both approaches is entirely possible.

• Topic 2: Introducing Bayes’ rule. Bayes’ rule can be described as a means to calculate the probability of causes from some known effects. As such, it can be used as a means for performing statistical inference. In this section of the course, we will work through some simple and intuitive calculations using Bayes’ rule. Ultimately, all of Bayesian data analysis is based on an application of these methods to more complex statistical models, and so understanding these simple cases of the application of Bayes’ rule can help provide a foundation for the more complex cases.

• Topic 3: Bayesian inference in a simple statistical model. In this section, we will work through a classic statistical inference problem, namely inferring the number of red marbles in an urn of red and black marbles, or equivalent problems. This problem is easy to analyse completely with just the use of R, but yet allows us to delve into all the key concepts of all Bayesian statistics including the likelihood function, prior distributions, posterior distributions, maximum a posteriori estimation, high posterior density intervals, posterior predictive intervals, marginal likelihoods, Bayes factors, model evaluation of out-of-sample generalization.

##### Tuesday 11th

**Classes from 12:00 to 17:00**

• Topic 4: Bayesian analysis of normal models. Statistical models based on linear and normal distribution are a mainstay of statistical analyses in general. They encompass models such as linear regression, Pearson’s correlation, t-tests, ANOVA, ANCOVA, and so on. In this section, we will describe how to do Bayesian analysis of normal linear models, focusing on simple examples. One of the aims of this section is to identify some important and interesting parallels between Bayesian and classical or frequentist analyses. This shows how Bayesian and classical analyses can be seen as ultimately providing two different perspectives on the same problem.

• Topic 5: The previous section provides a so-called analytical approach to linear and normal models. This is where we can calculate desired quantities and distributions by way of simple formulae. However, analytical approaches to Bayesian analyses are only possible in a relatively restricted set of cases. On the other hand, numerical methods, specifically Markov Chain Monte Carlo (MCMC) methods can be applied to virtually any Bayesian model. In this section, we will re-perform the analysis presented in the previous section but using MCMC methods. For this, we will use the brms package in R that provides an exceptionally easy to use interface to Stan.

##### Wednesday 12th

**Classes from 12:00 to 17:00**

• Topic 6: Bayesian linear models. We begin by covering Bayesian linear regression. For this, we will use the brm command from the brms package, and we will compare and contrast the results with the standard lm command. By comparing and contrasting brm with lm we will see all the major similarities and differences between the Bayesian and classical approach to linear regression. We will, for example, see how Bayesian inference and model comparison works in practice and how it differs conceptually and practically from inference and model comparison in classical regression. As part of this coverage of linear models, we will also use categorical predictor variables and explore varying intercept and varying slope linear models.

##### Thursday 13th

**Classes from 12:00 to 17:00**

• Topic 7: Extending Bayesian linear models. Classical normal linear models are based on strong assumptions that do not always hold in practice. For example, they assume a normal distribution of the residuals, and assume homogeneity of variance of this distribution across all values of the predictors. In Bayesian models, these assumptions are easily relaxed. For example, we will see how we can easily replace the normal distribution of the residuals with a t-distribution, which will allow for a regression model that is robust to outliers. Likewise, we can model the variance of the residuals as being dependent on values of predictor variables.

• Topic 8: Bayesian generalized linear models. Generalized linear models include models such as logistic regression, including multinomial and ordinal logistic regression, Poisson regression, negative binomial regression, zero-inflated models, and other models. Again, for these analyses we will use the brms package and explore this wide range of models using real world data-sets. In our coverage of this topic, we will see how powerful Bayesian methods are, allowing us to easily extend our models in different ways in order to handle a variety of problems and to use assumptions that are most appropriate for the data being modelled.

##### Friday 14th

**Classes from 12:00 to 17:00**

• Topic 9: Multilevel and mixed models. In this section, we will cover the multilevel and mixed effects variants of the regression models, i.e. linear, logistic, Poisson etc, that we have covered so far. In general, multilevel and mixed effects models arise whenever data are correlated due to membership of a group (or group of groups, and so on). For this, we use a wide range of real-world data-sets and problems, and move between linear, logistic, etc., models are we explore these analyses. We will pay particular attention to considering when and how to use varying slope and varying intercept models, and how to choose between maximal and minimal models. We will also see how Bayesian approaches to multilevel and mixed effects models can overcome some of the technical problems (e.g. lack of model convergence) that beset classical approaches.