[ Announcements | Projects | Homeworks ]


CSc 650 Algorithms for Computational Biology

Time and place Tuesdays and Thursdays, 12:30-1:45pm, Haury 219
Dates January 17-May 6, 2008
Instructor John Kececioglu
kece@cs.arizona.edu
Gould-Simpson 720
(520) 621-4526
Office hours Tuesday 1:45-3:00pm, Thursday 3:30-4:45pm, and by appointment.
Description This course exposes students to current research on discrete algorithms in computational biology. The focus is on the application of string and graph algorithms and combinatorial optimization to problems in sequence comparison, multiple alignment, physical mapping, sequence assembly, and genome rearrangement. The emphasis throughout is on problem formulation, algorithm design, and implementation.

As this is an advanced topics course, there are no exams or final. Grades are based on homeworks (which stress algorithm design), and programming projects (which stress algorithm implementation). Each project implements a significant algorithm studied in class. Project assignments specify an interface for a C++ class library, and are graded by checking correctness on test instances, comparing solution values of near-optima, and measuring execution times.

Prerequisites CSc 545, or permission of instructor.
Required text Hans-Joachim Böckenhauer and Dirk Bongartz,
Algorithmic Aspects of Bioinformatics,
Springer-Verlag, Berlin, 2007.
Optional text Dan Gusfield,
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology,
Cambridge University Press, New York, 1997.
Related books The following books are also relevant. Those that are available will be placed on reserve in the Main Library.
Syllabus The course focuses on three main topics: suffix arrays and trees, multiple sequence alignment, and genome rearrangement. In each of these topics we will learn a significant result and its implementation. Along the way we will also cover sequence comparison, physical mapping, and sequence assembly. Two topics not covered are evolutionary trees and protein folding.

I   Exact string matching

  • Suffix arrays and a linear-time construction algorithm.
  • Suffix trees and linear-time construction from a suffix array.
  • Applications to static and dynamic string matching, longest k-common substrings, and finding all maximal repeats.

II   Approximate string matching

  • Preliminaries: edit distance, reduction to string alignment, solution by dynamic programming; equivalence of minimizing distance and maximizing similarity; recurrences for database searching and local similarity.
  • Fundamentals: linear gap-costs, linear-space alignment, unit-cost edits, and heaviest common-subsequences.
  • Advanced techniques: convex gap costs, suboptimal alignments, the four-Russians technique, and incremental string alignment.

III   Multiple sequence alignment

  • Generic multiple alignment and the five basic objective functions.
  • Exact and approximation algorithms for the sum-of-pairs, fixed-tree, and maximum-trace objectives.

IV   Physical mapping

  • Reconstructing order: mapping with unique, non-unique, and non-overlapping probes.
  • Reconstructing distance: mapping with non-overlapping probes.

V   Sequence assembly

  • Shortest common superstring: approximation with respect to compression and length; relation to computational learning theory.
  • Shotgun sequencing: approximation with respect to compression; double-barreled sequencing; separating repeats.

VI   Genome rearrangement

  • Inversions: signed and unsigned variations; exact and approximation algorithms.
  • Translocations: centromere variations; exact and approximation algorithms.
  • Putting it all together: mixed inversion, translocation, fission, and fusion.
Projects Here is the programming project.
Homeworks Here are the homework assignments.
Grading
50%   Project
50%   Homeworks
Policies On homework you may discuss general ideas with friends but your solutions must be written up separately and represent individual work. Use of solutions from previous offerings of the course is not permitted. Homework is due at the start of class; homework turned in once class begins is not accepted.

When you turn in solutions to homeworks write only on one side of the paper and staple your pages together. (Do not fold them!) Neatness and especially conciseness in your write-up is required to earn the highest marks. If you are tempted to ramble at length about ideas that don't quite work, please don't: points will be taken off. If you cannot solve a problem admit this up-front in your write-up, and write down only what you know to be correct.

Announcements

If you'd like to learn more about algorithms applied to molecular biology, you might consider attending the department's Computational Biology Seminar.



[ Top | Department ]

http://www.cs.arizona.edu/classes/cs650/spring08/
John Kececioglu (kece@cs.arizona.edu)
11 March 2008