Have you ever wondered how autosomal SNPs are associated with specific populations across the world? The answer is that allele frequency data is used to associate alleles to various population regions.

Allele Frequency Database:
- ALFRED (http://alfred.med.yale.edu/alfred/index.asp)
- dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/)

Theoretically, any marker that has an allele frequency difference between ancestral populations, known as ancestry informative markers (AIMs), can be used for admixture mapping. The ideal AIM has one allele that is monomorphic in one population (p = 1.0) and that is not present in another. However, most alleles are shared among populations. Hence, it is important to identify and choose informative AIMs across populations.

HapMap allele frequency data (http://www.hapmap.org (http://www.hapmap.org/) webcite (http://www.webcitation.org/query.php?url=http://www.hapmap.org&refdoi=10.1186/1756-0381-2-1), March 13, 2007 release)

The Affymetrix 500 K allele frequency data may be downloaded from http://www.affymetrix.com (http://www.affymetrix.com/) webcite (http://www.webcitation.org/query.php?url=http://www.affymetrix.com&refdoi=10.1186/1756-0381-2-1).

The Illumina 100 K allele frequency data may be downloaded from http://www.illumina.com (http://www.illumina.com/)

The most used method for measuring marker informativeness for ancestry between 2 parental populations can be ascertained through the absolute value of the difference in the frequency of a particular allele observed for 2 ancestral populations. If we let p11 represent the frequency of a reference allele in the first parental population and p21 the frequency of the same allele in the second parental population, then the delta value is given by δ = |p11 - p12|. A marker with a delta value of 1 provides perfect information regarding its ancestry, whereas a marker with a delta value of 0 carries no information for ancestry.

Based on this methodology, Baye et al., 2009 (http://www.biodatamining.org/content/2/1/1), found that from the characterized HapMap, Affymetrix, and Illumina SNP databases, 17.3%, 2.6%, and 1.3%, respectively, were 100% noninformative for ancestry. More importantly though, they discovered that by comparing the HapMap SNP database against Affymetrix 500 K and the gene centric Illumina 100 K SNP chips, a relatively large fraction (> 80%) of SNPs in these databases do not meet the cutoff for acceptable markers as ancestry informative markers, which means that they are either of very low frequency or not ancestry informative between the 2 ancestral populations. They concluded that interpopulation differences using the HapMap databases shows that a total of only 30 of the interpopulation marker comparisons had very large frequency differences or 100% informative for ancestry (delta = 1) between the 2 ancestral groups. The few 100% informative SNPs for ancestry in these findings are consistent with prior studies, showing that most DNA variation is shared among human populations.

Are admixture tests based on these SNPs relevant, and if so, why so?

