Colloquium Speaker

Speaker: 
Saharon Rosset Ph.D. Data Analytics Group, IBM T.J. Watson Research Center
Topic: 
The Genographic Project: Introduction, Early Results and Statistical Challenges
Date: Thursday, February 22, 2007
Time: 11:00 AM
Place: Gould-Simpson, Room 906
Refreshments will be served in the 9th floor atrium of Gould-Simpson at 10:45 AM

Abstract

The Genographic project is a research partnership of National Geographic and IBM, intended to investigate the migration history of humans across the globe through the "history book" hidden in the DNA of each one of us. This project is unprecedented in scope: it targets genetic testing and analysis of 100,000 members of indigenous populations around the world and at least as many members of the general public who choose to purchase participation kits.

In this talk, I will first give a brief overview of this project and its scope. I will discuss our work on classifying and analyzing the large mtDNA database (>80000 samples) already collected. I will then survey some statistical challenges that arise in this project, and concentrate on one or both of the following topics, as time permits:

1. Maximum likelihood estimation of mutation probabilities and coalescent tree sizes in mtDNA. The control region of the mitochondrial DNA mutates much more quickly than most DNA in our body. Consequently it contains useful information for phylogenetic analysis. However the mutation rate across this region is known to vary greatly. I will describe some data from the Genographic project, including observed mtDNA control region mutations and haplogroup classification of several thousand individuals. The poisson likelihood associated with the number of mutations at each locus in each haplogroup reduces to a binomial likelihood given the observed data. This maximum likelihood problem can be solved as a binomial GLM with a complementary log-log link function. The resulting estimates are useful for improving mtDNA classification models and phylogenetic analysis, and also give interesting insights into the population history of different haplogroups.

2. Power analysis of tests for discovery of admixture between modern humans and Neanderthals. I will discuss the feasibility of discovering inter-breeding between modern humans and Neanderthals in Europe through sequencing of nuclear DNA of modern Europeans. I will show how the length of regions sequenced and number of individuals sampled affect the probability of rejecting the "null" of no inter-breeding under various alternatives.

 

 

 

Home