Angela
Elite member
- Messages
- 21,823
- Reaction score
- 12,329
- Points
- 113
- Ethnic group
- Italian
See:
Mathiesen et al
http://mathii.github.io/2018/10/05/estimating-archaic-ancestry-from-23andme-data
"[FONT="]23andMe don’t actually report Neanderthal ancestry proportions - rather, they report the number of Neanderthal-derived variants you carry out of a set of around 1400 ascertained in [/FONT]Sankararam et al 2017[FONT="]. The procedure is described [/FONT]here[FONT="]. They also report which quantile you are in, in terms of these variants, and how you rank relative to people you know. Unfortunately these estimates don’t correspond very well to Neanderthal ancestry proportions."
"[/FONT][h=3]An indirect estimate of Neanderthal ancestry[/h][FONT="]Among people with the same genome-wide ancestry, there is likely very little variation in Neanderthal ancestry. So an indirect way of estimating Neanderthal ancestry would be to just assume that Neanderthal ancestry proportion is just a function of genome-wide ancestry, and estimate that function. To do this, we first estimate Neanderthal ancestry in the SGDP, as in Pruefer et al. Then, we restrict the SGDP data to SNPs that are found on the 23andMe genotyping array and compute its principal components. Next, we can project our 23andMe results onto the PCs defined by the SGDP. Finally, we fit a model to the SGDP data that expresses Neanderthal ancestry as a function of the first few PCs. So when we project our 23andMe data onto the SGDP PCs, this model gives us an estimate Neanderthal ancestry. The kind of model we need to fit is called a generalised additive model and can be fit using the R package mgcv.[/FONT]
[h=3]Indirect, but not that intersting.[/h][FONT="]I computed the statistics described above and wrote an R script to project 23andMe genotype data. The figure below shows the first two principal components of the SGDP, and the contour lines show the corresponding proportions of Neanderthal ancestry. You can generate this figure and project your own 23andMe data onto the PCs using the scripts here. It’s not that interesting though, because Neanderthal ancestry is just a function of genome-wide ancestry so two people with similar ancestry will always have the same genome-wide ancestry.[/FONT]
[h=3]Other approaches.[/h][FONT="]If you wanted to get a fully personalized estimate, you could combine this with an [FONT=MathJax_Math-italic]f[/FONT][FONT=MathJax_Main]4[/FONT]f4 ratio on the data itself - e.g. you could use this estimate as a prior, and use the [FONT=MathJax_Math-italic]f[/FONT][FONT=MathJax_Main]4[/FONT]f4 ratio and error to compute a posterior. I’m not sure that this would actually help very much, since the uncertainties are so high. But you might be able to identify people (if there are any) who have particularly extreme Neanderthal ancestry proportions. With large datasets you could recompute [FONT=MathJax_Math-italic]f[/FONT][FONT=MathJax_Main]4[/FONT]f4 ratio proportions and principal components on that dataset itself and get more accurate estimates, as well as individual level estimates."[/FONT]
Mathiesen et al
http://mathii.github.io/2018/10/05/estimating-archaic-ancestry-from-23andme-data
"[FONT="]23andMe don’t actually report Neanderthal ancestry proportions - rather, they report the number of Neanderthal-derived variants you carry out of a set of around 1400 ascertained in [/FONT]Sankararam et al 2017[FONT="]. The procedure is described [/FONT]here[FONT="]. They also report which quantile you are in, in terms of these variants, and how you rank relative to people you know. Unfortunately these estimates don’t correspond very well to Neanderthal ancestry proportions."
"[/FONT][h=3]An indirect estimate of Neanderthal ancestry[/h][FONT="]Among people with the same genome-wide ancestry, there is likely very little variation in Neanderthal ancestry. So an indirect way of estimating Neanderthal ancestry would be to just assume that Neanderthal ancestry proportion is just a function of genome-wide ancestry, and estimate that function. To do this, we first estimate Neanderthal ancestry in the SGDP, as in Pruefer et al. Then, we restrict the SGDP data to SNPs that are found on the 23andMe genotyping array and compute its principal components. Next, we can project our 23andMe results onto the PCs defined by the SGDP. Finally, we fit a model to the SGDP data that expresses Neanderthal ancestry as a function of the first few PCs. So when we project our 23andMe data onto the SGDP PCs, this model gives us an estimate Neanderthal ancestry. The kind of model we need to fit is called a generalised additive model and can be fit using the R package mgcv.[/FONT]
[h=3]Indirect, but not that intersting.[/h][FONT="]I computed the statistics described above and wrote an R script to project 23andMe genotype data. The figure below shows the first two principal components of the SGDP, and the contour lines show the corresponding proportions of Neanderthal ancestry. You can generate this figure and project your own 23andMe data onto the PCs using the scripts here. It’s not that interesting though, because Neanderthal ancestry is just a function of genome-wide ancestry so two people with similar ancestry will always have the same genome-wide ancestry.[/FONT]
[h=3]Other approaches.[/h][FONT="]If you wanted to get a fully personalized estimate, you could combine this with an [FONT=MathJax_Math-italic]f[/FONT][FONT=MathJax_Main]4[/FONT]f4 ratio on the data itself - e.g. you could use this estimate as a prior, and use the [FONT=MathJax_Math-italic]f[/FONT][FONT=MathJax_Main]4[/FONT]f4 ratio and error to compute a posterior. I’m not sure that this would actually help very much, since the uncertainties are so high. But you might be able to identify people (if there are any) who have particularly extreme Neanderthal ancestry proportions. With large datasets you could recompute [FONT=MathJax_Math-italic]f[/FONT][FONT=MathJax_Main]4[/FONT]f4 ratio proportions and principal components on that dataset itself and get more accurate estimates, as well as individual level estimates."[/FONT]