Loading Events

« All Events

  • This event has passed.

ONLINE COURSE – Model selection and model simplification (MSMS04) This course will be delivered live

9 January 2024 - 11 January 2024

£250.00 – £720.00
ONLINE COURSE – Model selection and model simplification (MSMS04) This course will be delivered live

Event Date

Tuesday, January 9th, 2024

COURSE FORMAT

This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.

COURSE PROGRAM

TIME ZONE – Central Time Zone – however all sessions will be recorded and made available allowing attendees from different time zones to follow.

Please email oliverhooker@prstatistics.com for full details or to discuss how we can accommodate you.

About This Course

This three day course covers the important and general topics of statistical model building, model evaluation, model selection, model comparison, model simplification, and model averaging. These topics are vitally important to almost every type of statistical analysis, yet these topics are often poorly or incompletely understood. We begin by considering the fundamental issue of how to measure model fit and a model’s predictive performance, and discuss a wide range of other major model fit measurement concepts like likelihood, log likelihood, deviance, residual sums of squares etc. We then turn to nested model comparison, particularly in general and generalized linear models, and their mixed effects counterparts. We then consider the key concept of out-of-sample predictive performance, and discuss over-fitting or how excellent fits to the observed data can lead to very poor generalization performance. As part of this discussion of out-of-sample generalization, we introduce leave-one-out cross-validation and Akaike Information Criterion (AIC). We then cover general concepts and methods related to variable selection, including stepwise regression, ridge regression, Lasso, and elastic nets. Following this, we turn to model averaging, which is an arguably always preferable alternative to model selection. Finally, we cover Bayesian methods of model comparison. Here, we describe how Bayesian methods allow us to easily compare completely distinct statistical models using a common metric. We also describe how Bayesian methods allow us to fit all the candidate models of potential interest, including cases were traditional methods fail.

Intended Audiences

This course is aimed at anyone who is interested in using R for data science or statistics. R is widely used in all areas of academic scientific research, and also widely throughout the public, and private sector.

Venue

Delivered remotely.

Course Details

Time zone – GMT+1

Availability – TBC

Duration – 3 x 1/2 days

Contact hours – Approx. 12 hours

ECT’s – Equal to 1 ECT’s

Language – English

Teaching Format
Assumed computer background
Equipment and software requirements

A laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers.

Participants should be able to install additional software on their own computer during the course (please make sure you have administration rights to your computer).

A large monitor and a second screen, although not absolutely necessary, could improve the learning experience. Participants are also encouraged to keep their webcam active to increase the interaction with the instructor and other students.

Tickets

The numbers below include tickets for this event already in your cart. Clicking "Get Tickets" will allow you to edit any existing attendee information as well as change ticket quantities.
Tickets are no longer available

Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@prstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited.

If you are unsure about course suitability, please get in touch by email to find out more oliverhooker@prstatistics.com

 

COURSE PROGRAMME

Tuesday 9th

Classes from 12:00 to 16:00 (Central Time Zone)

DAY 1

Topic 1: Measuring model fit. In order to introduce the general topic of model evaluation, selection, comparison, etc., it is necessary to understand the fundamental issue of how we measure model fit. Here, the concept of conditional probability of the observed data, or of future data, is of vital importance. This is intimately related, though distinct, to concept of likelihood and the likelihood function, which is in turn related to the concept of the log likelihood or deviance of a model. Here, we also show how these concepts are related to concepts of residual sums of squares, root mean square error (rmse), and deviance residuals.

Topic 2: Nested model comparison. In this section, we cover how to do nested model comparison in general linear models, generalized linear models, and their mixed effects (multilevel) counterparts. First, we precisely define what is meant by a nested model. Then we show how nested model comparison can be accomplished in general linear models with F tests, which we will also discuss in relation to R^2 and adjusted R^2. In generalized linear models, and mixed effects models, we can accomplish nested model comparison using deviance based chi-square tests via Wilks’s theorem.

Wednesday 10th

Classes from 12:00 to 16:00 (Central Time Zone)

DAY 2

Topic 3: Out of sample predictive performance: cross validation and information criteria. In the previous sections, the focus was largely on how well a model fits or predicts the observed data. For reasons that will be discussed in this section, related to the concept of overfitting, this can be a misleading and possibly even meaningless means of model evaluation. Here, we describe how to measure out of sample predictive performance, which measures how well a model can generalize to new data. This is arguably the gold-standard for evaluating any statistical models. A practical means to measure out of sample predictive performance is cross-validation, especially leave-one-out cross-validation. Leave-one-out cross-validation can, in relatively simple models, be approximated by Akaike Information Criterion (AIC), which can be exceptionally simple to calculate. We will discuss how to interpret AIC values, and describe other related information criteria, some of which will be used in more detail in later sections.

Topic 4: Variable selection. Variable selection is a type of nested model comparison. It is also one of the most widely used model selection methods, and variable selection of some kind is almost always done routinely in all data analysis. Although we will also have discussed variable selection as part of Topic 2 above, we discuss the topic in more detail here. In particular, we cover stepwise regression (and its limitations), all subsets methods, ridge regression, Lasso, and elastic nets.

Thursday 11th

Classes from 12:00 to 16:00 (Central Time Zone)

DAY 3

Topic 5: Model averaging. Rather than selecting one model from a set of candidates, it is arguably always better perform model averaging, using all the candidates models, weighted by the predictive performance. We show how to perform model average using information criteria.

Topic 6: Bayesian model comparison methods. Bayesian methods afford much greater flexibility and extensibility for model building than traditional methods. They also allow us to easily directly compare completely unrelated statistical models of the same data using information criteria such as WAIC and LOOIC. Here, we will also discuss how Bayesian methods allow us to fit all models of potential interest to us, including cases where model fitting is computationally intractable using traditional methods (e.g., where optimization convergence fails). This allows us therefore to consider all models of potential interest, rather than just focusing on a limited subset where the traditional fitting algorithms succeed.

 

Course Instructor

Dr. Rafael De Andrade Moral

  • Rafael is an Associate Professor of Statistics at Maynooth University, Ireland. With a background in Biology and a PhD in Statistics from the University of São Paulo, Rafael has a deep passion for teaching and conducting research in statistical modelling applied to Ecology, Wildlife Management, Agriculture, and Environmental Science. As director of the Theoretical and Statistical Ecology Group, Rafael brings together a community of researchers who use mathematical and statistical tools to better understand the natural world. As an alternative teaching strategy, Rafael has been producing music videos and parodies to promote Statistics in social media and in the classroom. His personal webpage can be found here

ResearchGate
GoogleScholar
ORCID
GitHub

Details

Start:
9 January 2024
End:
11 January 2024
Cost:
£250.00 – £720.00
Event Categories:
, ,

Venue

Delivered remotely (United Kingdom)
Western European Time Zone, United Kingdom + Google Map

Tickets

The numbers below include tickets for this event already in your cart. Clicking "Get Tickets" will allow you to edit any existing attendee information as well as change ticket quantities.
Tickets are no longer available