Introduction to Mixed (Hierarchical) models for biologists using R (IMBR01)
14 May 2018 - 18 May 2018£300.00 - £550.00
Mixed models, also known as hierarchical models and multilevel models, is a useful class of models for many applied sciences, including biology, ecology and evolution. The goal of this course is to give a thorough introduction to the logic, theory and most importantly implementation of these models to solve practical problems in ecology. Participants are not expected to know mathematics beyond the basic algebra and calculus. Participants are expected to know some R programming and to be familiar with the linear and generalized linear regression. We will be using JAGS (Just Another Gibbs Sampler) for Markov Chain Monte Carlo (MCMC) simulations for analyzing mixed models. The course will be conducted so that participants have substantial hands-on experience.
Research postgraduates, practicing academics and primary investigators in ecology and evolutionary biology, management and environmental professionals in government and industry.
Venue – Orford Musique, 3165 Chemin du Parc, Orford, QC J1X 7A2, Canada – Google Maps –
If you are arriving by plane, the most convenient airport is the Montréal-Pierre Elliott Trudeau International Airport.
To get from the airport to Orford Musique, you can either take an airport shuttle, rent a car or use a taxi. Note that since Orford Musique is roughly 140 km (87 miles) from the Montréal-Pierre Elliott Trudeau International Airport, a taxi ride may be costly.
A good option for airport shuttles is to use the company Aeroshuttle (https://aeronavette.ca/en/home/). A one-way trip costs 90 $CAN + taxes (103.48 $CAN) while a round trip costs 120 $CAN + taxes (137.97 $CAN). This airport shuttle will get you directly to Orford Musique.
Montréal transit system and Limocar
A cheaper but more complicated option is to take bus 747 (trajet Centre-ville) from the Montréal-Pierre Elliott Trudeau International Airport and exit at the end of the line; the “Berri-UQÀM” stop (see map below). The bus should take roughly 60 minutes to reach this stop depending on traffic. In the bus, the fare is 10 CAN$ and only coins are accepted. It is also possible to by bus ticket at the airport at the STM information counter. The full details of the bus route is available here http://www.stm.info/sites/default/files/planibus_mars2018/en/747.pdf.
Availability – 30 places
Duration – 5 days
Contact hours – Approx. 37 hours
ECT’s – Equal to 2 ECT’s
Language – English
We offer COURSE ONLY and ACCOMMODATION PACKAGES;
• COURSE ONLY – Includes lunch and refreshments.
• ACCOMMODATION PACKAGE (to be purchased in addition to the course only option) – Includes breakfast, lunch, dinner, refreshments and accommodation. Accommodation is single or double occupancy, single sex en-suite rooms. Arrival Sunday 13th May and departure Friday 18th May PM.
To book ‘COURSE ONLY’ with the option to add the additional ‘ACCOMMODATION PACKAGE’ please scroll to the bottom of this page.
Other payment options are available please email email@example.com
Cancellation policy: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact firstname.lastname@example.org Failure to attend will result in the full cost of the course being charged. In the unfortunate event that PR~statistics must cancel this course due to unforeseen circumstances a full refund for the course will be credited. However PR~statistics cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.
Each day there will be four sessions. Sessions 1 and 3 generally will be where the instructor will discuss the topic and show one or more example analyses. During sessions 2 and 4, participants will be provided with a scientiﬁc problem and a data set. We will solve the problem together; illustrating the thought process, computing details and the interpretation of the results. This will involve substantial contribution by the participants. We believe active learning is the best way to learn a new technique.
The exact time table for each topic will depend on the background of the participants. We anticipate the ﬁrst day and a half will be devoted to the review of the linear and generalized linear models. The last half of the second day will be devoted to going over JAGS and some simple applications. Third day will be used to discuss linear mixed models and their applications. Fourth day and the ﬁrst half of the ﬁfth day will be used to discuss Generalize linear mixed models and their applications. The applications will be drawn from main stream ecological situations: Quantitative genetics, Populations dynamics, Spatial and spatio-temporal data, Capture-recapture models, Occupancy and Abundance surveys, Species distribution models among others.
Assumed quantitative knowledge
A basic understanding of statistical concepts. Specifically, generalised linear regression models, statistical significance, hypothesis testing.
Assumed computer background
Familiarity with R. Ability to import/export data, manipulate data frames, fit basic statistical models & generate simple exploratory and diagnostic plots.
Equipment and software requirements
A laptop/personal computer with a working version or R and RStudio installed. R and RStudio are supported by both PC and MAC and can be downloaded for free by following these links.
You may also want to consider downloading the most recent version of JAGS. We will send you a list of specific R packages that you will need. Please make sure all of them are working properly on your computer before you arrive for the course.
UNSURE ABOUT SUITABLILITY THEN PLEASE ASK email@example.com
Meet at Orford Musique Between 16:00 and 20:00.
Linear and Generalized linear models
To understand mixed models, the most important ﬁrst step is to thoroughly understand the linear and generalized linear models. Also, when conducting the data analysis, it is useful to ﬁt a simpler ﬁxed eﬀects model before trying to ﬁt a more complex mixed eﬀects model. Hence, we will start with a very detailed review of these models. We are assuming that the participants are familiar with these models and hence we will emphasize some important, but not commonly covered, topics. This will also give us an opportunity to unify the notation, review the basic R commands and ﬁll out any gaps in knowledge and understanding of these topics.
1. We will show the use of non-parametric exploratory techniques such as classiﬁcation and regression trees (CART) for learning about important covariates and possible non-linearities in the relationships.
2. We will emphasize graphical and simulation based methods (e.g. Gelman and Hill, 2006) to understand and explore the implications of the ﬁtted model.
3. We will discuss graphical tools such as marginal and conditional plots that are useful for conveying the results of a multiple regression model to a lay person.
4. We will emphasize the use of graphical tools to conduct regression diagnostics and appropriateness of the model.
5. We will discuss the important concepts of confounding, eﬀect modiﬁcation and interaction. These are particularly important to conduct causal, not just correlational, inference using observational studies.
Tuesday 15th – Classes from 09:00 to 17:00
Many of the topics that will be covered involve the use of matrix algebra and calculus. While these mathematical techniques are essential tools for a mathematical statistician who is trying to understand the theory behind the methods, they can be avoided in practice by using simulation based techniques. The built-in functions such as the ’lm’ and ’glm’ to ﬁt the regression models use the method of maximum likelihood to estimate the parameters and conduct statistical inference. We will discuss the use of JAGS (Just Another Gibbs Sampler) and the R package ’dclone’ to ﬁt the same models. We will use a diﬀerent statistical philosophy, namely the Bayesian inference, to ﬁt these models. We will show how the Bayesian approach can be tricked into giving frequentist answers using data cloning (Lele et al. 2007, Ecology Letters). We will also discuss the rudiments of frequentist and Bayesian inference although we will not go into the pros and cons of them at this time. That will be covered during sessions 3 and 4 of the ﬁfth day (and, over beer afterwards).
1. What makes an inference statistical inference?
2. What do we mean by probability of an event?
3. How do we quantify uncertainty in an inferential statement in the frequentist framework?
4. How do we quantify uncertainty in an inferential statement in the Bayesian framework?
We will then discuss the simulation based methods to quantify uncertainty.
1. Parametric bootstrap to quantify frequentist uncertainty
2. Markov Chain Monte Carlo to quantify Bayesian uncertainty
3. Fitting LM and GLM using JAGS and Bayesian approach
Wednesday 16th – Classes from 09:00 to 17:00
Linear Mixed Models
Historically, linear mixed models arose in the study of quantitative genetics and heritability issues. They were successfully applied in animal breeding and led to the ’white’ revolution with abundance of milk supply for the developing world. They were, also, used in horse racing and other such fun areas. The other situation where linear mixed eﬀects models were developed were in the context of growth curves. We will follow this historical trajectory of mixed models, paying tribute to the great statisticians R. A. Fisher, C. R. Rao and Jerzy Neyman, and study linear mixed models ﬁrst. The questions they tried to solve were: Deciding the genetic value of a sire and/or a dam, studying heritability of traits, studying co-evolution of traits etc. These can be answered provided we assume that the sires and dams in our experiment or sample are merely a sample from a super-population of sires and dams. In growth curve analysis, we need to take into account that each individual is unique in its own way but is also a part of a population. How do we discuss both individual level and population inferences? In modern times, linear mixed eﬀects models have arisen in the context of small area estimation in survey sampling where one is interested in inferring about a census tract based on county or state level data. These models arise also in the context of combining remote sensed data from diﬀerent resolutions and types. The main issues that we will be discussing are:
1. What is a random eﬀect? What is a ﬁxed eﬀect? How do we decide if an eﬀect is random or ﬁxed?
2. How do we modify a linear regression model to accommodate random eﬀects?
3. Why bother ﬁtting a mixed eﬀects models? What do we gain?
4. How to modify the JAGS linear models program to ﬁt a linear mixed eﬀects model using JAGS?
5. What is the diﬀerence between a Bayesian and a frequentist inference?
6. What is a prior? What is a non-informative prior?
7. How do we interpret the results of a linear mixed eﬀects model ﬁt? Graphical and simulation based methods
8. How do we do model selection with mixed eﬀects models?
9. How do we do model diagnostics in mixed eﬀects models?
10. Parameter identiﬁabilty issues in linear mixed models
As we discuss these applications, we will discuss some subtle computational issues involved in using MCMC. In my recollection (which may be biased as it has been about 25 years since the quote), Daryl Pregibon said: MCMC is the crack cocaine of modern statistics; it is addictive, seductive and destructive. Hence, it is important for a practitioner to understand these issues in order not to misuse the MCMC technique.
1. What is a Markov Chain Monte Carlo method? Why is it necessary for mixed models?
2. What are the subtleties in implementing MCMC?: Convergence of the algorithm, Mixing of the chains.
3. Pros and cons of using MCMC
Thursday 17th – Classes from 09:00 to 17:00
Generalised Linear Mixed Models
We will again start the discussion of GLMM in its historical context. One of the initial uses of mixed models were in the context of over dispersion in count data. Zero inﬂated count data was another important example. The example that drove the current revolution in the use of GLMM was in the context of spatial epidemiology. Clayton and Caldor (1989, Biometrics) showed that one can use spatial correlation to improve the prediction in mapping disease rates. This was also an example of the application of Empirical Bayes methods that allow one to pool information from diﬀerent spatial areas (or, studies, or, scales, and so on).
1. Zero inﬂated data In many practical situations, we observe that there are many locations where there are zero counts, far in excess of what would be expected under the Poisson regression model. This can be eﬀectively modelled using a mixed model framework. The mixed models framework allows us to use much more complex and realistic models.
2. Over dispersion in GLM, Spatial GLM, Spatio-temporal GLM The Poisson regression model assumes that the mean and variance are equal. This is, often, not true in practice. Generally the variance in the data exceeds the mean. One can show that such over-dispersion can be modelled using a mixed eﬀects model. These models also arise in the context of capturerecapture sampling where capture probabilities vary across space or time or individuals.
3. Longitudinal or panel data with discrete response variable Many times we have data on diﬀerent individuals where within the individual there is temporal dependence but individuals are independent of each other. Cluster sampling is another situation where we have dependence within a cluster but independence between clusters. Such data needs to take into account the innate variation between individuals before one can discuss the eﬀect of interesting covariates or risk factors. Such data are eﬀectively modelled as GLMM.
4. Measurement error, missing data Missing data and measurement error are ubiquitous in ecological studies. Mixed models provide a convenient way to take into account these diﬃculties and infer about the underlying processes of interest. We will discuss these issues in the context of Population Viability Analysis, Spatial population dynamics and source-sink analysis, Occupancy and abundance surveys. These also arise while doing usual linear and generalized linear models if the covariates are measured with error.
5. Additional topics depending on the interest of the participants. These may include, for example, discussion of Species Distribution Models, Resource Selection Functions and Animal movement models.
6 Computational issues: Advanced topics
Friday 18th – Classes from 09:00 to 17:00
Mixed Models in a Bayesian Framework
MCMC is not the only approach to analyse mixed models. We will brieﬂy discuss Laplace approximation based techniques (INLA, in particular) along with approximate techniques such as Composite likelihood and Approximate Bayesian Computation. Because of the mathematical nature, this discussion will be somewhat limited, only giving the basics and hinting at the important issues.
7 Philosophical issues: Sophie’s choice
1. What are the philosophical problems with using the frequentist quantiﬁcation of uncertainty?
2. What are the philosophical problems with using the Bayesian quantiﬁcation of uncertainty?
3. Sophie’s choice?