Reproducible data science for population genetics (RDPG02)
11th February 2019 - 15th February 2019£275.00 - £490.00
With the increasing availability of various types of genetic and genomic data, population genetics and molecular ecology are becoming largely data driven sciences. Understanding the evolutionary, demographic, and ecological underpinning the genetic makeup of natural populations now relies on a combination of exploratory approaches and models. This course will provide an in-depth introduction to these techniques, with a strong emphasis on reproducibility though the use of modern analytic practices and tools. After an introduction to phylogenetic reconstruction, the course will cover a number of multivariate approaches for the analysis of genetic patterns, including supervised and unsupervised factorial methods, clustering approaches, and advanced methods for describing population diversity and revealing spatial genetic patterns. The approaches introduced will be applicable to most genetic data, including markers such as microsatellites, SNPs, or AFLP, as well as nucleotide and amino-acid sequence data. Every day will start with a lecture dedicated to a type of problem and methods, followed by an introduction to a specific technique for reproducible data analysis; afternoon will be devoted to hands on praticals. The last day will be devoted to open problems, where participants will be able to analyse their own data.
The course is aimed at PhD students, research postgraduates, and practicing academics as well as persons in industry working with genetic data in fields such as molecular ecology, evolutionary biology, and phylogenetics.
Venue – PR statistics head office – Google Map
Availability – 24 places
Duration – 5 days
Contact hours – Approx. 37 hours
ECT’s – Equal to 3 ECT’s
Language – English
We offer COURSE ONLY and ACCOMMODATION PACKAGES;
• COURSE ONLY – Includes lunch and refreshments.
• ACCOMMODATION PACKAGE (to be purchased in addition to the course only option) – Includes breakfast, lunch, welcome dinner Monday evening, farewell dinner Thursday evening, refreshments and accommodation. Self catering facilities are available in the accommodation. Accommodation is approx. a 6 minute walk form the PR statistics head office. Accommodation is multiple occupancy (max 3-4 people) single sex en-suite rooms. Arrival Sunday 10th February (after 5pm) and departure Friday 15th February (accommodation must be vacated by 9am).
To book ‘COURSE ONLY’ with the option to add the additional ‘ACCOMMODATION PACKAGE’ please scroll to the bottom of this page.
Other payment options are available please email email@example.com
Cancellation policy: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact firstname.lastname@example.org Failure to attend will result in the full cost of the course being charged. In the unfortunate event that PRstatistics must cancel this course due to unforeseen circumstances a full refund for the course will be credited. However PRstatistics cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.
A mixture of lectures and hands-on practicals. Data sets for computer practicals will be provided by the instructors, but participants are welcome to bring their own data.
Assumed quantitative knowledge
A basic understanding of concepts in population genetics and the statistical analysis of genetic data.
Assumed computer background
Previous experience with data analysis using R is required such as the ability to import/export data, manipulate data frames, fit basic statistical models & generate simple exploratory and diagnostic plots.
Equipment and software requirements
A laptop/personal computer with a working version or R and RStudio installed. R and RStudio are supported by both PC and MAC and can be downloaded for free by following these links.
It is essential that you come with all necessary software and packages already installed (you will be sent a list of packages prior to the course) internet access may not always be available.
UNSURE ABOUT SUITABLILITY THEN PLEASE ASK email@example.com
Sunday 10th – Meet at 43 Cook Street, Glasgow G5 8JN at approx. 17:00 onwards
Monday 11th – Classes from 09:30 to 17:30
Intro to phylogenetic reconstruction
Module 1a: reconstructing phylogenies from genetic sequence data. Three main approaches covered: distance-based phylogenies; maximum parsimony; and likelihood-based approaches.
Module 1b: reproducible data science using R: an introduction
Practical 1: phylogenetic reconstruction using R. Three main approaches plus rooting a tree; assessing/testing for a molecular clock; and bootstrapping.
Main packages: knitr, ape, phangorn.
Tuesday 12th – Classes from 09:30 to 17:30
Introduction to multivariate analysis of genetic data
Module 2a: key concepts in multivariate analysis. Focus on using factorial methods for genetic data analysis.
Module 2b: using R to generate high-quality pdf or word documents.
Practical 2: multivariate analysis of genetic data in R. Topics include: data handling, Hardy-Weinberg tests, measures of diversity, tests of population structure, principal component analysis (PCA), multidimensional scaling (MDS).
Main packages: knitr, rmarkdown, adegenet, ade4, pegasa, hierfstat, ape.
Wednesday 13th – Classes from 09:30 to 17:30
Exploring group diversity
Module 3a: approaches for identifying and describing genetic clusters. Topics include: hierarchical clustering, K-means, genetic distances between populations, supervised factorial methods including between-group PCA and the Discriminant Analysis of Principal Components (DAPC).
Module 3b: using R to generate beamer and html5 slides.
Practical 3: applying the approaches covered in morning lecture and emphasising their strengths and weaknesses.
Main packages: rmarkdown, adegenet, ade4, hierfstat.
Thursday 14th – Classes from 09:30 to 17:30
Spatial genetic structures
Module 4a: on the origins of spatial genetic patterns, how to test for them, and how to reveal and visualise them.
Module 4b: asking questions the right way with reproducible code.
Practical 4: visualising and analysing spatial genetic data. Topics: spatial density estimates, univariate and multivariate tests of spatial structure (Moran and Mantel tests), mapping principal components from unsupervised methods (PCA), spatial PCA.
Main packages: reprex, adegenet, spdep, ade4.
Friday 15th – Classes from 09:30 to 16:00
Reproducible data science for population genetics in practice
Open problem day – analyse your own data using R
Main packages: knitr, rmarkdown, adegenet, ade4, ape, pegas, phangorn, hierfstat, poppr, ggplot2, etc.