Making genotyping cheaper and easier

NIAB-EMR: Dr Richard Harrison

Background

In order for breeding programmes to effectively implement marker-assisted breeding and genomic selection, individuals must be genotyped.  Genotyping can never be too cheap nor too easy to use. Genotyping with sequencing based (GwS) approaches are increasingly cheap and promise genotyping costs <$1/sample (Bevan, Uauy, Wulff, Zhou, Krasileva, and Clark. Nature 2017). Sequencing is also more flexible: adding new markers, changing marker numbers is quick, whereas SNP chips need to be redesigned and remade. Yet despite its tremendous promise sequencing has yet to replace the older genotyping SNP chip method. This is in part because of total costs – SNP chip’s data are easy to analyse using a desktop or laptop, whereas sequences data requires bioinformatics expertise and the use servers – which greatly increases costs and time. Critically if breeders do not have their own bioinformatics facilities they must release their private data to a third party for this work.

Objectives and approach

The student will develop a new faster more efficient approach to call genotypes on a laptop in minutes or seconds which nearly anyone could use. These efficiencies will be achieved by:

  1. The standard approach aligns all sequence reads for all samples to the whole genome, a slow and costly procedure. As all GwS style methods reduce the fraction of the genome sequenced to the same 0.01-0.1% this is unnecessary, especially on a typical structured cross where there are few possible sequences. Instead we will only compare these 0.1% to each other.
  2. The comparison will use a computationally efficient alignment free method e.g. kmer hashes. Kmers are short words but because of the 4 bases (A, C, G and T) there are more than 4,000,000,000,000,000,000 possible 31 base kmers (431). Most will be unique if they occur at all in a fruit crop genome, few (0.1% of the genome kmers) will be present in our GwS samples, and even less are markers. A recent paper (Bray, Pimentel, Melsted & Pachter. Nature Biotech. 2016) has shown how a similar approach can be used to match millions of reads to a small fraction of a large (human) genome in just minutes on a standard laptop.
  3. This tool will run on a laptop with a simple user interface, ideally even allowing drag and drop of files.

This project will train a student in bioinformatics, which is a vital skill for the industry and benefit a whole range of industry and AHDB-funded breeding programmes as well as producing a valuable tool for the industry.

Beginning in October 2018, the successful candidate should have (or expect to have) an Honours Degree (or equivalent) at 2.1 or above in relevant subjects.

Anyone interested should send your application (CV, cover letter, personal statement and two names for reference) to  recruitment@emr.ac.uk, citing the project reference. Application deadline is 13 April 2018.