23andMe 23andMe's methodology artificially purifies your ancestry results

Tomenable · Apr 10, 2019

This was originally posted on Anthrogenica by user Kurd (creator of some of GEDmatch calculators) in this thread:

https://anthrogenica.com/showthread...stry-Composition-or-GEDmatch-calculators-more

Quote:

"23andMe's speculative mode greatly overestimates major components, and underestimates minor components. This is due to their methodology of snipping the genome into 100 SNP segments to compare against the limited references they have. So for example, if 60% of the segment indicates Middle Eastern, and 40% indicates South Asian, that segment is assigned 100% Middle Eastern. In effect 40% of the segment, which is South Asian is ignored, and the whole segment is assigned Middle-Eastern.

Also, their methodology includes segment smoothing, which means if there are chunks of minor components in a segment, they are ignored.

That is how Iranians and West Asians turn out 98-100% Middle Eastern, and folks in neighboring Pakistan turn out 98-100% South Asian in speculative mode.

This naturally is unrealistic and uninformative, because you don't need a test to tell you that. Conservative mode is better with regards to inflation of major components and underestimation of minor components, but the trouble here is that people get 5-70% unassigned. This is where your minor components are hidden.

The above translates to 23andMe being useless for figuring out your minor components to any degree of accuracy."

I don't think doing that is wrong or right, but people should understand 23andMe shows recent and geographical (not deep or ethnic/"racial") ancestry.

That's why they claim what they do about the timeframe (that their test goes only few centuries back, to Early Modern Era).

Joey37 · Apr 10, 2019

Also they do a lousy job in detecting Low German; by simply doing math via (cyber) paper trail, I determined all of my 15% Broadly Northwest European is my Low Saxon (both from the Netherlands and Germany) ancestry.

Regio X · Apr 10, 2019

Tomenable said:
This was originally posted on Anthrogenica by user Kurd (creator of some of GEDmatch calculators) in this thread:
https://anthrogenica.com/showthread...stry-Composition-or-GEDmatch-calculators-more
Quote:
"23andMe's speculative mode greatly overestimates major components, and underestimates minor components. This is due to their methodology of snipping the genome into 100 SNP segments to compare against the limited references they have. So for example, if 60% of the segment indicates Middle Eastern, and 40% indicates South Asian, that segment is assigned 100% Middle Eastern. In effect 40% of the segment, which is South Asian is ignored, and the whole segment is assigned Middle-Eastern.
Also, their methodology includes segment smoothing, which means if there are chunks of minor components in a segment, they are ignored.
That is how Iranians and West Asians turn out 98-100% Middle Eastern, and folks in neighboring Pakistan turn out 98-100% South Asian in speculative mode.
This naturally is unrealistic and uninformative, because you don't need a test to tell you that. Conservative mode is better with regards to inflation of major components and underestimation of minor components, but the trouble here is that people get 5-70% unassigned. This is where your minor components are hidden.
The above translates to 23andMe being useless for figuring out your minor components to any degree of accuracy."
I don't think doing that is wrong or right, but people should understand 23andMe shows recent and geographical (not deep or ethnic/"racial") ancestry.
That's why they claim what they do about the timeframe (that their test goes only few centuries back, to Early Modern Era).

Hmm... Perhaps 23andMe should include clusters even more informative, like ANE, WHG etc.?

Seriously, I don't get. I certainly respect his point of view, but my opinion is different. Not sure what he wants exactly, if just knowing shared ancient ancestries rather than actual recent ancestry, if ancient direct influences... And how far back?
Well, I really don't know details of 23andMe's methodology, but no, apparently that's not "why" they claim what they do about the timeframe. That's "because" of the established timeframe. In fact, they are being criticized here for doing what seems a very complex job, for doing what other commercial companies would like to do, and in fact try to do without such success. For example, if a company says a full Italian is 100% Italian, without compromise other clusters, pretty difficult thing, then this company must be great. Imo the supposed flaw is in fact a virtude. Additionally, commercial companies generally have to choose just one kind of calculator, not several, as we have in GedMatch.
Kurd is great, and did good algorithms (I bought some; Geneplaza's, right?), but I wonder if he himself could do the same as 23andMe. 98-100%? Wow! This seems realistic and informative, because some people, especially mixed, "do need a test to tell you that", and in fact it's what they're hoping, right? Not sure "Oracles", for example, would do the job with more accuracy. Again, I wonder what exactly he wants. That 23andMe says you're 10% West Asian, 10% East Med and this kind of stuff? That it keeps the current clusters and makes them more flexible? Probably the last option. I don't have much knowledge on this, but I guess it would be even easier for 23andMe and any other. But come on! No problem with these more flexible approaches, but they"re certainly not the proposal of this kind of commercial tests.
So, imo, 23andMe certainly is not perfect, but it seems right in its methodology. Now, what it could, should and probably will do, is adding more sub-regional clusters for Western Asian, South Asian etc.

https://www.23andme.com/ancestry-composition-guide/

Angela · Apr 10, 2019

Yes, well, while his Ancient Calculator was excellent, his other ones are horrible for Europeans, so... I am not either Bulgarian or Albanian.

A lot of the other amateur calculators are horrible too. Why does Gedmatch still carry the J Test? It's a joke. For Italians, in particular, you'd need to include a LOT of reference samples to get decent fits.

Plus, 23andme isn't in the business of telling you about your ancient ancestors. Neither is Ancestry, which is the other decent one. The rest are terrible. That isn't of interest to their customers.

If you want to try to match yourself against ancient samples you can use the raw data.

suebiking · Apr 10, 2019

With all the problems that it has, it is still reliable if you are a third our forth generation immigrant to the US or Canada for instance.
Though to me the major problem is actually classifying ancient admixture (what is from where), it isn't always that simple. For example, let's say that about the year 1000 there was a man from what is now Poland who went to what is now France, he had children who stayed in France and then went back to Poland and had more children there and for the sake of simplicity let's pretend that since that year there were no major demography changing events and that that man's DNA, somehow is still found in a discernible percentage in a descendant today.
Would his DNA be classified has Polish or French, the easy answer is Polish, but if his descendants used for the calculator are French than it will be French. And even if you build a calculator using only admixture from DNA of people who were buried in a certain location in the year 1000, this problem can also arise.
So, what I believe is that we can only build a truly reliable calculator if the samples are taken from a certain time-period and if it is said that the samples were from a certain location in that period of time.
I hope I made myself understood, since even to me trying to explain it is pretty complex, still, I hope I did not change the subject matter of the forum too much.

Angela · Apr 12, 2019

I just saw a post by Razib Khan saying he's impressed by the sub-regional break down 23andme produces. So am I.

They got it exactly right for me without my ever telling them anything about my ancestry. Well, they don't show the eastern Ligurian, but I've come to the conclusion that eastern Ligurians are a mix of western Ligurians, Emilians and Tuscans. That would explain why Emilia is the closest match, but undoubtedly still "south" of my father's score.

As I said upthread, if you want "ancient" ancestry, get the raw data and use other tools.

Hold on, they've gotten really specific. There are dark blue intrusions into the eastern Ligurian Alps, what looks like that border area where Lombardia, Piemonte, and Liguria meet (they speak Ligurian), and from which we have a reference sample, and what may be Sarzana and La Spezia, which are eastern Liguria.

Is the southern part of Tuscany not included in the dark blue and is instead shoved into Lazio??? That would also make sense.

Well done, well done! Excellent!

TardisBlue · Apr 12, 2019

My regional breakdown is spot on for Italy… BUT, I have provided them with genealogical information. As far as my French goes, they completely failed to pick it up, as I've only got 3.8% undefined French and German, and no regional breakdown. The latest update (Beta version) has however increased my F&G to 9.8%, decreased my Italian to 49.5% (16% decrease) while increasing my Balkan and Greek by 1 point. I'm pretty sure that if I have my Mom tested and her results phased against mine, my results will be more accurate and the F&G will go up (hopefully with a regional breakdown).

Regio X · Apr 12, 2019

TardisBlue said:
My regional breakdown is spot on for Italy… BUT, I have provided them with genealogical information. As far as my French goes, they completely failed to pick it up, as I've only got 3.8% undefined French and German, and no regional breakdown. The latest update (Beta version) has however increased my F&G to 9.8%, decreased my Italian to 49.5% (16% decrease) while increasing my Balkan and Greek by 1 point. I'm pretty sure that if I have my Mom tested and her results phased against mine, my results will be more accurate and the F&G will go up (hopefully with a regional breakdown).

Apparently trio-phasing does do a big difference, increasing mainly the recall, I'd guess. The higher the level (sub-regional -> regional -> continental) the higher the precision and recall. French&German, btw, is a cluster with low recall: just 20%. Meaning a good chunck of it must be "hidden" as Broadly Northern European, which is not necessarily an imprecision. See, you got 9.8%, but it doesn't mean they are assigning wrongly most of the other actual F&G %. A low recall generally is somewhat balanced by the precision of the "competing" clusters. So, it's like these other F&G %s are liable to additional controlls - so to speak -, related to the precision of the other clusters, generally high, and they would tend to be assigned, in your case, to the Broadly Northern European, which in turn has a good precision "and" recall. F&G is really a difficult cluster anyway. Not sure they'll manage to "solve" its low recall.

That said, there is still a margin of error, naturally, not just as suggested by the very numbers provided by 23andMe, but also because some countries are just complicated. I doubt their references cover all Italy, for example, which would mean some results don't fit in their statistics - not based on "all" kind of Italians. Trio-phasing made my results almost perfect. However, my mother is still getting too much Northern % (31.9% of F&G + 1.3% B&I + 5.4% Broadly NW. E.). Yes, it may be shared ancestry, but it doesn't matter, given the purpose of Ancestry Composition. She is Italian in ancestry, certainly for more than 500 years, and 23andMe statistics don't fit in her results. Nor father's (but his results are not that off as my mother's). Now, I wanna know how they will solve that without compromise other clusters.
Anyway, this likely don't apply to France (the country) at the same level, then I guess the related precision and recall are more realistic here.
Still, I believe Ancestry Composition is the best. Besides, there is this tool showing which regions likely match etc. It compensates it a bit. My mother is well placed in this tool.

@Angela
Sorry! I got more Italian than you in new version: 85%.

Angela · Apr 12, 2019

Regio X said:
Apparently trio-phasing does do a big difference, increasing mainly the recall, I'd guess. The higher the level (sub-regional -> regional -> continental) the higher the precision and recall. French&German, btw, is a cluster with low recall: just 20%. Meaning a good chunck of it must be "hidden" as Broadly Northern European, which is not necessarily an imprecision. See, you got 9.8%, but it doesn't mean they are assigning wrongly most of the other actual F&G %. A low recall generally is somewhat balanced by the precision of the "competing" clusters. So, it's like these other F&G %s are liable to additional controlls - so to speak -, related to the precision of the other clusters, generally high, and they would tend to be assigned, in your case, to the Broadly Northern European, which in turn has a good precision "and" recall. F&G is really a difficult cluster anyway. Not sure they'll manage to "solve" its low recall.

That said, there is still a margin of error, naturally, not just as suggested by the very numbers provided by 23andMe, but also because some countries are just complicated. I doubt their references cover all Italy, for example, which would mean some results don't fit in their statistics - not based on "all" kind of Italians. Trio-phasing made my results almost perfect. However, my mother is still getting too much Northern % (31.9% of F&G + 1.3% B&I + 5.4% Broadly NW. E.). Yes, it may be shared ancestry, but it doesn't matter, given the purpose of Ancestry Composition. She is Italian in ancestry, certainly for more than 500 years, and 23andMe statistics don't fit in her results. Nor father's (but his results are not that off as my mother's). Now, I wanna know how they will solve that without compromise other clusters.
Anyway, this likely don't apply to France (the country) at the same level, then I guess the related precision and recall are more realistic here.
Still, I believe Ancestry Composition is the best. Besides, there is this tool showing which regions likely match etc. It compensates it a bit. My mother is well placed in this tool.

@Angela
Sorry! I got more Italian than you in new version: 85%.

They clearly made a terrible error.

That was all on the v3 chip, so a lot more snps, I think.

This is the "Italian" map:

This is v4/v5: they use less snps. I don't know if they're perhaps "better" snps or not, but it does result in slightly different results.

The Italian map:

Obviously, since the older version says I'm more Italian, that's the correct one.

I wonder if when they run the reference samples of individuals they use only the ones on the chip you have. That might make a difference.

Tomenable · Apr 12, 2019

Well, as for these recently introduced 23andMe's sub-regional matches - I was asking Living DNA customer service about my updated Living DNA results, and I asked them also what they think of recent 23andMe updates.

They told me:

Quote: "(...) Regarding your question about 23andMe - their regions are not DNA based. They are records/cousin matching based. If you share some distant DNA with people living in a particular region, they add that region as a part of your breakdown. So comparing the number of regions offered by different DTC companies became more difficult, because we are comparing two different things. (...)" - wrote a person from Living DNA team.

So this new "recent ancestors feature" of 23andMe is generally based on where your matches are from. It is not based on ancestry-informative markers (ancestry-informative SNPs) in your own DNA, but on segments that you share with other people.

In other words, if you use "GEDCOM + DNA matches" feature on GEDmatch, and go through family trees of all of your matches, you can figure out something very similar (which regions are your closest matches from). Of course it would take a lot of time, and 23andMe does it for you, so kudos to them for this. I just wonder if they check birthplaces of great-grandparents of your matches, or report this based on current places of residence of your matches alone.

I've ordered 23andMe and waiting for my results, we'll see which Polish regions they will report for me.

Regio X · Apr 13, 2019

Angela said:
They clearly made a terrible error.

That was all on the v3 chip, so a lot more snps, I think.

This is the "Italian" map:

This is v4/v5: they use less snps. I don't know if they're perhaps "better" snps or not, but it does result in slightly different results.

The Italian map:

Obviously, since the older version says I'm more Italian, that's the correct one.

I wonder if when they run the reference samples of individuals they use only the ones on the chip you have. That might make a difference.

This is controversial.

The maps are really cool. They place us perfectly in Veneto, and the others I've seen, including yours, are generally right, or almost right. Pretty intelligent tool!

We tested v4. The 23andMe results changed few time ago. Mine and my father's got much better, while my mother's are still close to the old one, as yours, apparently.
If you check the numbers provided by 23andMe on its guide, you'll see that yours (similar to my father's), as my mother's, don't fit. In an ideal situation, we should be categorized as Italians, Broadly Southern Europeans and Broadly Europeans. That said, the precision of Northern European cluster is 96%, supposedly. So, in theory, it would mean, according to this last update, that you're almost surely 19% N. European, while my mother woud be ~37% at least. There must be a margin, of course; still, these are not exactly low numbers, then the probability of a good chunck of them is correct should tend to 100%, if the statistics were right. This is the way I see it.
So, we know this is not true, 'cause our ancestries are actually full Italian, meaning that the statistics of precision/recall are not that accurate, at least when it comes to Italy, a country with a relatively high genetic diversity. Maybe they used as references Italians from some relatively few areas?
Or perhaps the precision and recall %s are valid just for the conservative mode? I don't even know if it still exists. If so, why doesn't 23andMe specify it?
Yet, they inform in the timeline my mother has certain "foreign" ancestry in the last 2-4 generations, for example. Wrong.

From the guide:
"Precision answers the question 'When the system predicts that a piece of DNA comes from population A, how often is the DNA actually from population A?' Recall answers the question 'Of the pieces of DNA that actually are from population A, how often does the system correctly predict that they are from population A?'"

I just checked. They say they use 606 reference individuals for Italian cluster. Plus:
"The reference datasets are made up of individuals from publicly available datasets including the Human Genome Diversity Project , HapMap ,and the 1000 Genomes project , as well as individuals from private 23andMe data collections and a large number of 23andMe customers who have consented to participate in research. In total, there are 10,419 research-consented customers and 2,612 non-customers in these population reference datasets."

Apparently, lot of people. I'm confused.

@Tomenable
So they use IBD, indirectly.

Angela · Apr 13, 2019

Regio X said:
This is controversial.

The maps are really cool. They place us perfectly in Veneto, and the others I've seen, including yours, are generally right, or almost right. Pretty intelligent tool!

We tested v4. The 23andMe results changed few time ago. Mine and my father's got much better, while my mother's are still close to the old one, as yours, apparently.
If you check the numbers provided by 23andMe on its guide, you'll see that yours (similar to my father's), as my mother's, don't fit. In an ideal situation, we should be categorized as Italians, Broadly Southern Europeans and Broadly Europeans. That said, the precision of Northern European cluster is 96%, supposedly. So, in theory, it would mean, according to this last update, that you're almost surely 19% N. European, while my mother woud be ~37% at least. There must be a margin, of course; still, these are not exactly low numbers, then the probability of a good chunck of them is correct should tend to 100%, if the statistics were right. This is the way I see it.
So, we know this is not true, 'cause our ancestries are actually full Italian, meaning that the statistics of precision/recall are not that accurate, at least when it comes to Italy, a country with a relatively high genetic diversity. Maybe they used as references Italians from some relatively few areas?
Or perhaps the precision and recall %s are valid just for the conservative mode? I don't even know if it still exists. If so, why doesn't 23andMe specify it?
Yet, they inform in the timeline my mother has certain "foreign" ancestry in the last 2-4 generations, for example. Wrong.

From the guide:
"Precision answers the question 'When the system predicts that a piece of DNA comes from population A, how often is the DNA actually from population A?' Recall answers the question 'Of the pieces of DNA that actually are from population A, how often does the system correctly predict that they are from population A?'"

I just checked. They say they use 606 reference individuals for Italian cluster. Plus:
"The reference datasets are made up of individuals from publicly available datasets including the Human Genome Diversity Project , HapMap ,and the 1000 Genomes project , as well as individuals from private 23andMe data collections and a large number of 23andMe customers who have consented to participate in research. In total, there are 10,419 research-consented customers and 2,612 non-customers in these population reference datasets."

Apparently, lot of people. I'm confused.

@Tomenable
So they use IBD, indirectly.

It may be a lot of people, but it's all reative. I would bet they have tens of thousands of people with different types of Northern European ancestry. Plus, they're very homogeneous.

I don't think they have a lot for certain areas of Italy, especially in the north. Italian-Americans do test, but the vast majority are from southern Italy and Sicily.

Maybe they have quite a few from your Dad's area of the Veneto but not from your mother's. If some obsessed person tests twenty relatives, you're going to get a lot of similarity.

No one in my area(s) seems to be at all interested, so I'm not surprised they can't assign some of me as "Italian". After all this time I only know a handful of people from eastern Liguria who have tested, and one or two from the Emilian Appennines. A lot of my "Italian" matches are actually Argentines from La Spezia, Genova and Parma who were part of a medical study.

On the "relatives" feature where they do indeed use IBD analysis, my only Italian matches are from Emilia, Toscana, and Liguria, (usually partial ancestry at that), and the rest are non-Italian. In fact, they tell me that 80% of my "matches" report "French-German" ancestry. I've had two families with Danish ancestry since the beginning, and it's clearly not recent.

The number of snps on the chip is another factor. I'm 83% Italian on the V2/V3 chip which has a lot more snps.

TardisBlue · Apr 13, 2019

Regio X said:
Apparently trio-phasing does do a big difference, increasing mainly the recall, I'd guess. The higher the level (sub-regional -> regional -> continental) the higher the precision and recall. French&German, btw, is a cluster with low recall: just 20%. Meaning a good chunck of it must be "hidden" as Broadly Northern European, which is not necessarily an imprecision. See, you got 9.8%, but it doesn't mean they are assigning wrongly most of the other actual F&G %. A low recall generally is somewhat balanced by the precision of the "competing" clusters. So, it's like these other F&G %s are liable to additional controlls - so to speak -, related to the precision of the other clusters, generally high, and they would tend to be assigned, in your case, to the Broadly Northern European, which in turn has a good precision "and" recall. F&G is really a difficult cluster anyway. Not sure they'll manage to "solve" its low recall.

That said, there is still a margin of error, naturally, not just as suggested by the very numbers provided by 23andMe, but also because some countries are just complicated. I doubt their references cover all Italy, for example, which would mean some results don't fit in their statistics - not based on "all" kind of Italians. Trio-phasing made my results almost perfect. However, my mother is still getting too much Northern % (31.9% of F&G + 1.3% B&I + 5.4% Broadly NW. E.). Yes, it may be shared ancestry, but it doesn't matter, given the purpose of Ancestry Composition. She is Italian in ancestry, certainly for more than 500 years, and 23andMe statistics don't fit in her results. Nor father's (but his results are not that off as my mother's). Now, I wanna know how they will solve that without compromise other clusters.
Anyway, this likely don't apply to France (the country) at the same level, then I guess the related precision and recall are more realistic here.
Still, I believe Ancestry Composition is the best. Besides, there is this tool showing which regions likely match etc. It compensates it a bit. My mother is well placed in this tool.

Thanks for your explanations Regio X, I don't know much at all about DNA testing and the knowledge of members here is impressive. What puzzles me is my low Broadly NW European, only 8.6%. If you add the Broadly European and F&G, that only makes 23.3%. If we include the British and Irish, that makes 27.3%. I also have 2.1% unassigned. The rest is Italian / Greek Balkan / 1.3% Spanish and Portuguese, and Broadly Southern Euro (15.5%), with a tiny bit of W. Asia and N. Africa. So all in all, my 23andme results seem to be more south shifted - for now.

Regio X · Apr 13, 2019

Angela said:
It may be a lot of people, but it's all reative. I would bet they have tens of thousands of people with different types of Northern European ancestry. Plus, they're very homogeneous.

I don't think they have a lot for certain areas of Italy, especially in the north. Italian-Americans do test, but the vast majority are from southern Italy and Sicily.

Maybe they have quite a few from your Dad's area of the Veneto but not from your mother's. If some obsessed person tests twenty relatives, you're going to get a lot of similarity.

No one in my area(s) seems to be at all interested, so I'm not surprised they can't assign some of me as "Italian". After all this time I only know a handful of people from eastern Liguria who have tested, and one or two from the Emilian Appennines. A lot of my "Italian" matches are actually Argentines from La Spezia, Genova and Parma who were part of a medical study.

On the "relatives" feature where they do indeed use IBD analysis, my only Italian matches are from Emilia, Toscana, and Liguria, (usually partial ancestry at that), and the rest are non-Italian. In fact, they tell me that 80% of my "matches" report "French-German" ancestry. I've had two families with Danish ancestry since the beginning, and it's clearly not recent.

The number of snps on the chip is another factor. I'm 83% Italian on the V2/V3 chip which has a lot more snps.

Yeah, you must be right. Most of testers with Italian ancestry are Americans, so the supposed lack of references may explain their statistics, built just over what they do have in hand, naturally. Anyway, trio-phasing really works, and my results are suggestive that a good chunck of my mother's N. European could be, yes, categorized as Italian, and in fact is Italian. Indeed, according to 23andMe itself, all what I inherited from her are Italian. 50%! Very interesting!

Regio X · Apr 13, 2019

TardisBlue said:
Thanks for your explanations Regio X, I don't know much at all about DNA testing and the knowledge of members here is impressive. What puzzles me is my low Broadly NW European, only 8.6%. If you add the Broadly European and F&G, that only makes 23.3%. If we include the British and Irish, that makes 27.3%. I also have 2.1% unassigned. The rest is Italian / Greek Balkan / 1.3% Spanish and Portuguese, and Broadly Southern Euro (15.5%), with a tiny bit of W. Asia and N. Africa. So all in all, my 23andme results seem to be more south shifted - for now.

And what is your actual ancestry/admixture? And which region(s) of France?

TardisBlue · Apr 13, 2019

Regio X said:
And what is your actual ancestry/admixture? And which region(s) of France?

Actual ancestry is 50% Campania on the paternal side. Maternal side: 37.5% Champagne-Ardennes (NE France) + 12.5% Haute-Savoie (not far from the Italian border so that could be interpreted as Italian, though I don't know how much DNA I may have inherited from my ggfather). Ftdna, on the other hand, gives me 55% W. and Central Europe (and only 17% SE Europe + 19% Middle-Eastern), that's why I'm puzzled with my 23andme results. I'm sure phasing would "fix" it though so I'm considering having my mom tested at some point (she tested with MyHeritage. I uploaded her data to Gedmatch, where she comes up as mostly French or Dutch).

Angela · Apr 13, 2019

Fwiw, I get the same similarity number for Savoy as for Lombardy, so they're probably very "Italian" like. That would bring you up to 72.5%, which is pretty close to what you get, isn't it? Wasn't it 75% Italian?

Ed. 62.5%

TardisBlue · Apr 13, 2019

Angela said:
Fwiw, I get the same similarity number for Savoy as for Lombardy, so they're probably very "Italian" like. That would bring you up to 72.5%, which is pretty close to what you get, isn't it? Wasn't it 75% Italian?

Not that much

I got 66% with the normal version, and the latest Beta version brings it down to 49.6%, which does reflect my actual ancestry. However if you add the Broadly Southern European plus the Greek and Balkan, it's still "too much" compared to my actual ancestry. Not that I mind though

It just puzzles me.

paul333 · Apr 16, 2019

I had a slight change to my 23 & Me, due to the Up-date I think, my results are now showing 100% European, ( same ) but now 98.3% Northwest European.

The rest 1.7% is split between Southern European 1.2%, ( Spanish & Portuguese 0.02%, and Broadly Southern European 1.0% ) and Broadly European 0.5%.

Nicu · Jan 30, 2022

Tomenable said:
This was originally posted on Anthrogenica by user Kurd (creator of some of GEDmatch calculators) in this thread:

https://anthrogenica.com/showthread...stry-Composition-or-GEDmatch-calculators-more

Quote:

"23andMe's speculative mode greatly overestimates major components, and underestimates minor components. This is due to their methodology of snipping the genome into 100 SNP segments to compare against the limited references they have. So for example, if 60% of the segment indicates Middle Eastern, and 40% indicates South Asian, that segment is assigned 100% Middle Eastern. In effect 40% of the segment, which is South Asian is ignored, and the whole segment is assigned Middle-Eastern.

Also, their methodology includes segment smoothing, which means if there are chunks of minor components in a segment, they are ignored.

That is how Iranians and West Asians turn out 98-100% Middle Eastern, and folks in neighboring Pakistan turn out 98-100% South Asian in speculative mode.

This naturally is unrealistic and uninformative, because you don't need a test to tell you that. Conservative mode is better with regards to inflation of major components and underestimation of minor components, but the trouble here is that people get 5-70% unassigned. This is where your minor components are hidden.

The above translates to 23andMe being useless for figuring out your minor components to any degree of accuracy."

I don't think doing that is wrong or right, but people should understand 23andMe shows recent and geographical (not deep or ethnic/"racial") ancestry.

That's why they claim what they do about the timeframe (that their test goes only few centuries back, to Early Modern Era).

I have to say that this may have some truth to it. I get the impression that over time they seem to have worked on "consolidating" more of the ancestry reports into bigger, cleaner, major categories, often overlooking minor amounts of ancestry that differ from that or just stuffing/subsuming it into the bigger category. Like when I first took the test ten years ago, it noted a roughly corresponding or expected amount for Italian ancestry (around 7-8%) given my one eighth from my great-grandmother who was actually born there. Over time, they've made that smaller and smaller to the point where it is now 0% lol. I'm like 99.7% Balkan now on 23andme. I also wonder if they change things around based on what you put for your known reported ancestry of your family, to more easily fit that? Since I took that part off just to see what would happen out of curiosity. Similar thing seems to have happened on Ancestry.com as well.

Also, the regions/localities where they place you within a country for me are pretty surprising and often don't match with expected results and known family history.

Anyway 23andme isn't really about ancient ancestry much at all, and I don't think they claim to be. It's about the last 500 years or so. If you're looking for deeper ancestry there are other services, or you can use the raw data from things like 23 and Ancestry for that.

23andMe 23andMe's methodology artificially purifies your ancestry results

Elite member

Regular Member

Regular Member

Elite member

Regular Member

Elite member

Regular Member

Regular Member

Elite member

Elite member

Regular Member

Elite member

Regular Member

Regular Member

Regular Member

Regular Member

Elite member

Regular Member

Banned

Regular Member