Dealing with the GC-content bias in second-generation DNA sequence data
by Terry Speed
Abstract: The field of genomics is currently dealing with an explosion of data from so-called second-generation DNA sequencing machines. This is creating many challenges and opportunities for statisticians interested in the area. In this talk I’ll outline the technology and the data flood, and move on to one particular problem where the technology is used: copy-number analysis. There we find a novel bias, which, if not dealt with properly, can dominate the signal of interest. I’ll describe how we think about and summarize it, and go on to identify a plausible source of this bias, leading up to a way of removing it. Our approach makes use of the total variation metric on discrete measures, but apart from this, is largely descriptive.
For More Information: contact: Mihee Lee. email: firstname.lastname@example.org