School Seminars and Colloquia

Evaluation of Similarity Between Two Sequences

Statistics Seminar

by Susan Wilson


Institution: Australian National University
Date: Tue 6th May 2008
Time: 10:00 AM
Location: Architecture Room 103 (eZone) The University of Melbourne

Abstract: The question ”How should we measure and evaluate the similarity between
two sequences?” is the focus of this presentation. A commonly encountered
situ-
ation, particularly in biology, is to have a query sequence and want to
find which,
say DNA or protein, sequences in a large database have ”significant”
similarity
to this query sequence. The widely accepted solution to the question is
based on
the notion of alignment, and the BLAST program is the most frequently used
method. There are very many situations though for which the inherent
assumption underlying the use of local alignment methods is violated.
Hence there has
been much interest in development of alignment-free sequence comparison
algorithms.
Arguably the best of these is the number of k-words shared between two
sequences. The statistic, called D2, is simple and extremely fast to
compute. Its
distribution and asymptotic properties are being studied, and recent
results and
unsolved problems will be overviewed.

For More Information: For more information email: Dr. Guoqi Qian g.qian@ms.unimelb.edu.au