School Seminars and Colloquia

Dealing with the GC-content bias in second-generation DNA sequence data

Statistics Seminar

by Terry Speed

Institution: The Walter and Eliza Hall Institute of Medical Research
Date: Tue 27th September 2011
Time: 1:00 PM
Location: Room 213 Richard Berry Building, University of Melbourne

Abstract: The field of genomics is currently dealing with an explosion of data from so-called second-generation DNA sequencing machines. This is creating many challenges and opportunities for statisticians interested in the area. In this talk I’ll outline the technology and the data flood, and move on to one particular problem where the technology is used: copy-number analysis. There we find a novel bias, which, if not dealt with properly, can dominate the signal of interest. I’ll describe how we think about and summarize it, and go on to identify a plausible source of this bias, leading up to a way of removing it. Our approach makes use of the total variation metric on discrete measures, but apart from this, is largely descriptive.

