PDA

View Full Version : A Major Word of Caution About Ethnicity & Admixture Calculators



moore2moore
25-02-16, 00:26
I've posted it before: the "science" behind so-called "admixture calculators" can't yet be called that. The validity of the output data is getting there, and will get there eventually, but it's not quite sufficient so that we can call the extant methods, "scientific."

In other words, the calculators exist primarily for fun. They can help identify heritage from a continent very well. And they can give a sense to adopted children of their heritage.

But those who impart precision, science, or much else into the results of calculators are charlatans, and we should all be very wary of such people.

Now comes a study (http://gbe.oxfordjournals.org/content/early/2016/02/22/gbe.evw034.abstract) in Genome Biology and Evolution. It turns out that certain alleles on certain segments of the human genome will tie Chinese and Japanese with people from Kenya and Nigeria.

I hope you realize the significance of that statement. If a "calculator" was based in whole or in part on those markers, it would produce a result indicating recent admixture between two populations that haven't mixed for millennia. It might tell a Nigerian that she was part Japanese.

Other similar segments will tie Southern Europeans to Africans. Board posters regularly assume that those would be valid, with little inquiry into methods. That is a sort of confirmation bias toward preconceived notions.

With that in mind, here are the three questions the community must ask of people who rely on a "science" that is still not quite ready for prime time.

1. If a calculator produces such a misleading result with one population, how can you trust it generally? In other words, if a calculator's programmer chooses, as a subjective human, to use the markers that tell a Nigerian she is Japanese, then how can we trust the other markers that this subjective human also chose? Because there is a more plausible connection within pop culture?

2. If 30 different calculators produce 30 different results, which 29 are wrong? Would we trust thermometers that produced such widely disparate results?

3. What is the time frame a calculator purports to predict? Modern, 500 years ago, 2000 years ago, Neolithic, Paleolithic? If a calculator tells a Brit that she is 12.5% "Middle Eastern," does that mean gramma cheated on grandpa? Does it mean a colonial came to England in the 1700s? That a Roman-era soldier settled in her town from the Middle East?

Or does it simply tell her that she is European, and that a certain percentage of European genes came from Middle-Eastern-like Early Farmers? (DUH).

There is a big difference.

There is little value in telling us that human populations all converge as you go farther back in time.

Namu
02-03-16, 10:55
Just for clarity: do you mean the "Oracle" ethnicity mix predictors ? I've always thought they were there purely so people could compare the "predictions" with their actual parental/grandparentl ethnicity as a gauge of the precision of the calculator's actual results.

moore2moore
02-03-16, 19:13
Just for clarity: do you mean the "Oracle" ethnicity mix predictors ? I've always thought they were there purely so people could compare the "predictions" with their actual parental/grandparentl ethnicity as a gauge of the precision of the calculator's actual results.

You have a healthy attitude about them. In my experience, most people on the web do not. They view them as gospel truth, and use them to support whatever theory they are pushing.

srdceleva
15-07-16, 16:38
Well you have some valid points but I wouldn't call them unscientific, more like scientific but with huge inconsistencies. One of the biggest problems with autosomal admixture tests, such as on Gedmatch when clicking on the Oracle and plotting urself on a sort of relationship map to other countries, is that you can end up clustering with countries you have literally no relation to. For example, if your half russian and half Nigerian you may end up clustering to some country in North Africa just based on your admixture results and then people will claim you have a genetic mix similar to some guy in Libya , when you probably have hadnt ever had an ancestor from there and genetically have very little to do with those people as you are literally half russian and half nigerian.What i want to see studies do more is to not only show what countries you cluster closest to or what countries the samples cluster closest to, but to show with which countries did that person or samples share the longest segments of their DNA with, this proving which countries you actually have a recent common ancestor with and giving a much better picture to work with than just wich countries the person or samples clusters with. A good example of this are balkan slavs. If you read studies about Jugoslavian , Macedonians, or Bulgarians you'll see the studies conclusion that balkan slavs besides Croatians and Slovenians were not impacted genetically much by the Slavic expansion. However when IBD segments were tested, so comparing the longest shared segment of DNA( the longer the segment strand the more recent the common ancestor was)southern, Slavs shared over two times the amount of IBD segments with northern and western slavs than they did with neighboring greeks. This is a very important detail that studies need to include when making such broad conclusions as they do so as to give a much clearer and three D picture of genetic relationships between peoples and countries.

davef
15-07-16, 16:52
Well you have some valid points but I wouldn't call them unscientific, more like scientific but with huge inconsistencies. One of the biggest problems with autosomal admixture tests, such as on Gedmatch when clicking on the Oracle and plotting urself on a sort of relationship map to other countries, is that you can end up clustering with countries you have literally no relation to. For example, if your half russian and half Nigerian you may end up clustering to some country in North Africa just based on your admixture results and then people will claim you have a genetic mix similar to some guy in Libya , when you probably have hadnt ever had an ancestor from there and genetically have very little to do with those people as you are literally half russian and half nigerian.What i want to see studies do more is to not only show what countries you cluster closest to or what countries the samples cluster closest to, but to show with which countries did that person or samples share the longest segments of their DNA with, this proving which countries you actually have a recent common ancestor with and giving a much better picture to work with than just wich countries the person or samples clusters with. A good example of this are balkan slavs. If you read studies about Jugoslavian , Macedonians, or Bulgarians you'll see the studies conclusion that balkan slavs besides Croatians and Slovenians were not impacted genetically much by the Slavic expansion. However when IBD segments were tested, so comparing the longest shared segment of DNA( the longer the segment strand the more recent the common ancestor was)southern, Slavs shared over two times the amount of IBD segments with northern and western slavs than they did with neighboring greeks. This is a very important detail that studies need to include when making such broad conclusions as they do so as to give a much clearer and three D picture of genetic relationships between peoples and countries.

Even if the half Nigerian half Russian gets Libya as his top match, there's a number to the right which gives the distance to an average Libyan, and I woud expect it to be really high, where the higher the number the further you are from someone of that population genetically, but it is still the closest. I've seen half Ashkenazim half North European get something like Romania as the top match but the distance score was something like 13 which is ridiculously far from that population genetically, but its still the closest. I think I read that 5 or under is a decent to perfect fit.

davef
16-07-16, 15:54
And tbh these admixture calculators are good for entertainment, that is you can figure out how you relate to persons of ethnicities other than your own; so if you're an English person of 100 percent english heritage who gets 3 percent Syrian, you share through IBS not necessarily IBD 3 percent of your genome with a typical Syrian of the sample used. If I'm wrong; let me know as this is just a guess.

Angela
16-07-16, 16:55
Even if the half Nigerian half Russian gets Libya as his top match, there's a number to the right which gives the distance to an average Libyan, and I woud expect it to be really high, where the higher the number the further you are from someone of that population genetically, but it is still the closest. I've seen half Ashkenazim half North European get something like Romania as the top match but the distance score was something like 13 which is ridiculously far from that population genetically, but its still the closest. I think I read that 5 or under is a decent to perfect fit.

Northern Europeans can get fits of under 1 to their own or at least a very related group. Now that's an almost perfect fit. One to two isn't bad. I don't think an FST of 5 is very good.

Many if not most Italians don't get fits like that, because we have too much regional variation, even, or perhaps most often, in the North. In most calculators I get a fit of 5 or above, even to neighboring samples like Bergamo or Toscana, which I think is awful. The only ones where I get something around 2 or 3 is where they have a lot of northern Italian samples from different areas.

The best use of these calculators in terms of your own family history is actually to compare yourself to other people of your "ethnicity". If you find a strange or "atypical" percentage for one of the components you know there was someone in your tree harboring some "divergent" ancestry.

As for comparisons with other "ethnicities", it may tell you how your ethnicity differs, but any speculations about the reason for the differences have to be done very cautiously and with some understanding of the population genetics of Europe as a whole. The fact, for example, that one group scores more "southwest Asian" than another doesn't tell you why or when that became part of the genome. Only ancient dna can give you clues about that.

davef
16-07-16, 17:53
Northern Europeans can get fits of under 1 to their own or at least a very related group. Now that's an almost perfect fit. One to two isn't bad. I don't think an FST of 5 is very good.

Many if not most Italians don't get fits like that, because we have too much regional variation, even, or perhaps most often, in the North. In most calculators I get a fit of 5 or above, even to neighboring samples like Bergamo or Toscana, which I think is awful. The only ones where I get something around 2 or 3 is where they have a lot of northern Italian samples from different areas.

The best use of these calculators in terms of your own family history is actually to compare yourself to other people of your "ethnicity". If you find a strange or "atypical" percentage for one of the components you know there was someone in your tree harboring some "divergent" ancestry.

As for comparisons with other "ethnicities", it may tell you how your ethnicity differs, but any speculations about the reason for the differences have to be done very cautiously and with some understanding of the population genetics of Europe as a whole. The fact, for example, that one group scores more "southwest Asian" than another doesn't tell you why or when that became part of the genome. Only ancient dna can give you clues about that.

I'll admit, I got that info from this one site called "the apricity". I kinda got the vibe that I wouldn't learn much from it, and I felt like I couldn't scroll the page without getting flooded with annoying ads. How bad is 5? I know it means you're only a partial fit to the population you're being compared to but in what proportion? So with you, when you got a fit of 5 with Tuscans, does that mean that you're only like 85 percent Tuscan or something according to that calculator?
Also sample size is key here in my opinion. I bet if they had more samples from bergamo or Tuscany they would've, in probability, found one who's genome matches yours more closely. Correct me if I'm wrong.

Also, yeah 3 percent southwest asian could mean recent ancestry or a more ancient connection. 3 percent is shared with southwest Asians, if I'm correct. Doesn't necessarily make one 3 percent southwest asian. But 3 percent is still shared with that population regardless of where it came from, right?

RobertColumbia
26-07-16, 22:43
And tbh these admixture calculators are good for entertainment, that is you can figure out how you relate to persons of ethnicities other than your own; so if you're an English person of 100 percent english heritage who gets 3 percent Syrian, you share through IBS not necessarily IBD 3 percent of your genome with a typical Syrian of the sample used. If I'm wrong; let me know as this is just a guess.

Another thing that these calculators do is help you interpret traditional genealogical data. A small amount of my ancestry (as traditionally documented) comes from Germany via Pennsylvania. Admixture calculators typically give me many hits in the Netherlands and northwestern Germany (e.g. Bremen area), and almost no hits in Bavaria, Switzerland, or Austria. Linking this in with the fact that my German ancestors seem to have been mostly Protestants (which are more common in northern Germany) rather than Catholics (who are more common further to the south), I can put all of this together and form a more complete picture of the likely journey of my ancestors. Is this perfect? Of course not. Does it give me a lot more confidence than a simple number on a chart? Of course it does.

davef
26-07-16, 23:05
Yeah, your german is northern according to the calculator if those groups show up. It gives you more confidence that that's where it's from.

Angela
27-07-16, 00:36
If you're trying to figure out your genealogy, looking at calculator averages can help. However, don't pay any attention to the things that certain people are posting as supposedly legitimate results of their "dna relatives" or "cousins", or even "friends". There is no way to verify the identity of those people, their precise origin, or that their scores haven't been "creatively" doctored.

Folks... Beware!

http://www.newmediaandmarketing.com/wp-content/uploads/2016/06/pt-barnum-quote-theres-a-******-born-every-minute.jpg

And two to fleece him!

I know, I know, it wasn't really P.T. Barnum!

Ed.
http://image.slidesharecdn.com/20150603backbasebuildingsmarterbank-150603151034-lva1-app6892/95/building-a-smarter-bank-with-ron-shevlin-2-638.jpg?cb=1433355045

davef
27-07-16, 00:48
Your link leads to a dead page

Wait, sorry. What you wanted us to see is in the second link. d'oh