Reproducible data science for population genetics (RDPG01)
23 October 2017 - 27 October 2017£260.00 - £580.00
With the increasing availability of various types of genetic and genomic data, population genetics and molecular ecology are becoming largely data driven sciences. Understanding the evolutionary, demographic, and ecological underpinning the genetic makeup of natural populations now relies on a combination of exploratory approaches and models. This course will provide an in-depth introduction to these techniques, with a strong emphasis on reproducibility though the use of modern analytic practices and tools. After an introduction to phylogenetic reconstruction, the course will cover a number of multivariate approaches for the analysis of genetic patterns, including supervised and unsupervised factorial methods, clustering approaches, and advanced methods for describing population diversity and revealing spatial genetic patterns. The approaches introduced will be applicable to most genetic data, including markers such as microsatellites, SNPs, or AFLP, as well as nucleotide and amino-acid sequence data. Every day will start with a lecture dedicated to a type of problem and methods, followed by an introduction to a specific technique for reproducible data analysis; afternoon will be devoted to hands on praticals. The last day will be devoted to open problems, where participants will be able to analyse their own data.
The course is aimed at PhD students, research postgraduates, and practicing academics as well as persons in industry working with genetic data in fields such as molecular ecology, evolutionary biology, and phylogenetics.
We offer COURSE ONLY and ACCOMMODATION PACKAGES;
• COURSE ONLY – Includes lunch and refreshments.
• ACCOMMODATION PACKAGE (to be purchased in addition to the course only option) – Includes breakfast, lunch, dinner, refreshments, minibus to and from meeting point and accommodation. Accommodation is multiple occupancy (max 3 people) single sex en-suite rooms. Arrival Sunday 22nd October and departure Friday 27th October PM.
To book ‘COURSE ONLY’ with the option to add the additional ‘ACCOMMODATION PACKAGE’ please scroll to the bottom of this page.
Other payment options are available please email firstname.lastname@example.org
Cancellation policy: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact email@example.com Failure to attend will result in the full cost of the course being charged. In the unfortunate event that PRstatistics must cancel this course due to unforeseen circumstances a full refund for the course will be credited. However PRstatistics cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.
A mixture of lectures and hands-on practicals. Data sets for computer practicals will be provided by the instructors, but participants are welcome to bring their own data.
Assumed quantitative knowledge
A basic understanding of concepts in population genetics and the statistical analysis of genetic data.
Assumed computer background
Previous experience with data analysis using R is required such as the ability to import/export data, manipulate data frames, fit basic statistical models & generate simple exploratory and diagnostic plots.
Equipment and software requirements
A laptop/personal computer with a working version or R and RStudio installed. R and RStudio are supported by both PC and MAC and can be downloaded for free by following these links.
It is essential that you come with all necessary software and packages already installed (you will be sent a list of packages prior to the course) internet access may not always be available.
UNSURE ABOUT SUITABLILITY THEN PLEASE ASK firstname.lastname@example.org
Sunday 15th Meet at Margam Discovery Centre at approximately 18:30 (Download directions PDF)
Monday 23rd – Classes from 09:00 to 17:00
Intro to phylogenetic reconstruction
Module 1a: reconstructing phylogenies from genetic sequence data. Three main approaches covered: distance-based phylogenies; maximum parsimony; and likelihood-based approaches.
Module 1b: reproducible data science using R: an introduction
Practical 1: phylogenetic reconstruction using R. Three main approaches plus rooting a tree; assessing/testing for a molecular clock; and bootstrapping.
Main packages: knitr, ape, phangorn.
Tuesday 24th – Classes from 09:00 to 17:00
Introduction to multivariate analysis of genetic data
Module 2a: key concepts in multivariate analysis. Focus on using factorial methods for genetic data analysis.
Module 2b: using R to generate high-quality pdf or word documents.
Practical 2: multivariate analysis of genetic data in R. Topics include: data handling, Hardy-Weinberg tests, measures of diversity, tests of population structure, principal component analysis (PCA), multidimensional scaling (MDS).
Main packages: knitr, rmarkdown, adegenet, ade4, pegasa, hierfstat, ape.
Wednesday 25th – Classes from 09:00 to 17:00
Exploring group diversity
Module 3a: approaches for identifying and describing genetic clusters. Topics include: hierarchical clustering, K-means, genetic distances between populations, supervised factorial methods including between-group PCA and the Discriminant Analysis of Principal Components (DAPC).
Module 3b: using R to generate beamer and html5 slides.
Practical 3: applying the approaches covered in morning lecture and emphasising their strengths and weaknesses.
Main packages: rmarkdown, adegenet, ade4, hierfstat.
Thursday 26th – Classes from 09:00 to 17:00
Spatial genetic structures
Module 4a: on the origins of spatial genetic patterns, how to test for them, and how to reveal and visualise them.
Module 4b: asking questions the right way with reproducible code.
Practical 4: visualising and analysing spatial genetic data. Topics: spatial density estimates, univariate and multivariate tests of spatial structure (Moran and Mantel tests), mapping principal components from unsupervised methods (PCA), spatial PCA.
Main packages: reprex, adegenet, spdep, ade4.
Friday 27th – Classes from 09:00 to 16:00
Reproducible data science for population genetics in practice
Open problem day – analyse your own data using R
Main packages: knitr, rmarkdown, adegenet, ade4, ape, pegas, phangorn, hierfstat, poppr, ggplot2, etc.