[ Announcements | Projects | Homeworks ]



CSC 650 Algorithms for Computational Biology

Time and place Spring 2011
Monday and Wednesday, 10:30-11:45am
Gould-Simpson 701
Dates
January 12
January 17
January 19
March 14, 16
May 4
  President Obama campus visit
  Martin Luther King Jr. holiday
  First class
  Spring break
  Last class
Instructor John Kececioglu
kece@cs.arizona.edu
Gould-Simpson 720
(520) 621-4526
Office hours Monday 1:30-3:00pm
Wednesday 2:00-3:00pm
and by appointment
Description This course exposes students to current research on discrete algorithms in bioinformatics and computational biology. The focus is on the application of string and graph algorithms and combinatorial optimization to problems in biological sequence analysis, genome rearrangement, evolutionary trees, and structure prediction. The emphasis throughout is on problem formulation, algorithm design, and implementation.

As this is an advanced topics course, there are no exams or final. Grades are based on homeworks (which stress algorithm design and analysis) and projects (which stress algorithm implementation and independent research). In an implementation project, a significant algorithm studied in class we will be programmed and evaluated. In a research project, an independent topic chosen by the student and instructor will be researched, written up, and presented. Programming project assignments specify an interface for a C++ class library, and are graded by checking correctness on test instances and measuring execution times.

Prerequisites CSC 545 or permission of instructor
Required text Wing-Kin Sung,
Algorithms in Bioinformatics: A Practical Introduction,
Chapman and Hall, Boca Raton, Florida, 2010.
Optional text Dan Gusfield,
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology,
Cambridge University Press, New York, 1997.
Related books The following books are also relevant. Those that are available will be placed on reserve in the library.
Syllabus The course begins with the classic area of sequence analysis, focusing on sequence comparison, multiple sequence alignment, and sequence assembly. We then move to more diverse topics in genome rearrangement, evolutionary trees, and predicting the structure of biological molecules.

I   Exact string matching

  • Suffix arrays and a linear-time construction algorithm.
  • Suffix trees and linear-time construction from a suffix array.
  • Burroughs-Wheeler transform and its use in compression and exact string matching.
  • Applications to static and dynamic string matching, longest k-common substrings, and finding all maximal repeats.

II   Approximate string matching

  • Preliminaries: edit distance, reduction to string alignment, solution by dynamic programming; equivalence of minimizing distance and maximizing similarity; recurrences for database searching and local similarity.
  • Fundamentals: linear gap-costs, linear-space alignment, unit-cost edits, and heaviest common-subsequences.
  • Advanced techniques: convex gap costs, suboptimal alignments, the four-Russians technique, incremental string alignment, and genome alignment.

III   Multiple sequence alignment

  • Generic multiple alignment and the five basic objective functions.
  • Exact and approximation algorithms for the sum-of-pairs, fixed-tree, and maximum-trace objectives.

IV   Sequence assembly

  • Shortest common superstring: approximation with respect to compression and length; relation to computational learning theory.
  • Shotgun sequencing: approximation with respect to compression; double-barreled sequencing; separating repeats.
  • Next generation sequencing: assembling genomes from very short reads, and mapping reads to a reference genome.

V   Genome rearrangement

  • Inversions: signed and unsigned variations; exact and approximation algorithms.
  • Translocations: centromere variations; exact and approximation algorithms.
  • Putting it all together: mixed inversion, translocation, fission, and fusion.

VI   Evolutionary trees

  • Tree construction: compatibility, maximum parsimony, maximum likelihood, and distance-based methods.
  • Tree comparison: measuring similarity of differing trees, finding consensus trees.

VII   Structure prediction

  • RNA secondary structure prediction: inferring structural base-pairing of RNA sequences.
  • Protein secondary structure prediction: predicting alpha-helices and beta-strands in protein sequences.
Homeworks There are roughly three homework assignments, which emphasize algorithm design and analysis.
Projects There is one programming project, emphasizing algorithm implementation.
Grading
50%   Homeworks
50%   Projects
Policies On homework you may discuss general ideas with friends but your solutions must be written up separately and represent individual work. Use of solutions from previous offerings of the course is not permitted. Homework is due at the start of class; homework turned in once class begins is not accepted.

When you turn in solutions to homeworks write only on one side of the paper and staple your pages together. (Do not fold them!) Neatness and especially conciseness in your write-up is required to earn the highest marks. If you are tempted to ramble at length about ideas that don't quite work, please don't: points will be taken off. If you cannot solve a problem admit this up-front in your write-up, and write down only what you know to be correct.

Announcements

Note that the due date for the programming project has been moved to May 9. The assignment for the project is been posted above.

To learn more about algorithms for bioinformatics, you are welcome to attend the Department's Computational Biology Seminar. This is a weekly journal club and research group meeting for students working with John Kececioglu.



[ Top | Department ]

http://www.cs.arizona.edu/classes/cs650/spring11/
John Kececioglu (kece@cs.arizona.edu)
18 April 2011