Research
Computational Biology
The Computational Biology cluster applies computer science, mathematics, and statistics to computational questions in biology. Work of the current cluster members centers around algorithm design and software development.
Faculty
Kobus Barnard
John Kececioglu
Carol Soderlund
Projects
Extended Description
Broadly speaking, computational biology is the application of computer science, mathematics, and statistics to computational questions in biology. There are four general approaches to the field:
- doing biology, aided by these mathematical disciplines
- designing algorithms for analytical tasks that frequently arise
- implementing software that helps biologists carry out their analyses
- evaluating the statistical significance of the computed solutions to determine whether a meaningful result has been found
While the best work encompasses all four approaches, most researchers concentrate on one or two. Work of the current cluster members centers around algorithm design and software development.
Our research has addressed a broad spectrum of tasks from many areas of computational biology, including:
- shotgun sequence assembly and chromosome physical mapping, from the area of genome sequencing (John Kececioglu and Cari Soderlund)
- analysis of genome rearrangements, from comparative genomics (John Kececioglu)
- construction of multiple sequence alignments, from sequence analysis (John Kececioglu)
- gene-expression microarray analysis, from gene regulation (Kobus Barnard)
- high-throughput protein identification by mass spectrometry, from proteomics (Kobus Barnard)
Mathematical formulations of many of these tasks lead to combinatorial optimization problems that are NP-complete, meaning they are computationally intractable in the worst case. Much of our work on these problems has been designing either (1) exact algorithms, which are guaranteed to find an optimal solution and run fast on biological data, although they take exponential time in the worst case, or (2) approximation algorithms, which are guaranteed to find a near-optimal solution and always run fast, even in the worst case. Some of our achievements include:
- the only exact and approximation algorithms known for shotgun sequencing
- the most accurate algorithm known for physical mapping with nonoverlapping probes
- the first exact and approximation algorithms for analysis of genome rearrangements by inversions and translocations
- the release of MSA, a widely-used software tool for multiple sequence alignment
Our current research in sequence analysis focuses on local alignment of genomes, multiple alignment of protein families, and motif finding for transcription factor binding sites. Future research goals include the development of a robust software library of implementations of algorithms for fundamental problems in computational biology that is portable, reliable, and freely available.