Eupedia Forums
Site NavigationEupedia Top > Eupedia Forum & Japan Forum
Results 1 to 19 of 19

Thread: 23andMe's methodology artificially purifies your ancestry results

  1. #1
    Elite member Achievements:
    Three FriendsRecommendation Second ClassVeteran50000 Experience Points

    Join Date
    07-09-14
    Posts
    4,528
    Points
    61,213
    Level
    76
    Points: 61,213, Level: 76
    Level completed: 73%, Points required for next Level: 437
    Overall activity: 20.0%

    Y-DNA haplogroup
    R1b
    MtDNA haplogroup
    W6

    Ethnic group
    Polish
    Country: Poland



    23andMe's methodology artificially purifies your ancestry results

    This was originally posted on Anthrogenica by user Kurd (creator of some of GEDmatch calculators) in this thread:

    https://anthrogenica.com/showthread....lculators-more

    Quote:

    "23andMe's speculative mode greatly overestimates major components, and underestimates minor components. This is due to their methodology of snipping the genome into 100 SNP segments to compare against the limited references they have. So for example, if 60% of the segment indicates Middle Eastern, and 40% indicates South Asian, that segment is assigned 100% Middle Eastern. In effect 40% of the segment, which is South Asian is ignored, and the whole segment is assigned Middle-Eastern.

    Also, their methodology includes segment smoothing, which means if there are chunks of minor components in a segment, they are ignored.

    That is how Iranians and West Asians turn out 98-100% Middle Eastern, and folks in neighboring Pakistan turn out 98-100% South Asian in speculative mode.

    This naturally is unrealistic and uninformative, because you don't need a test to tell you that. Conservative mode is better with regards to inflation of major components and underestimation of minor components, but the trouble here is that people get 5-70% unassigned. This is where your minor components are hidden.

    The above translates to 23andMe being useless for figuring out your minor components to any degree of accuracy."

    I don't think doing that is wrong or right, but people should understand 23andMe shows recent and geographical (not deep or ethnic/"racial") ancestry.

    That's why they claim what they do about the timeframe (that their test goes only few centuries back, to Early Modern Era).

  2. #2
    Regular Member Achievements:
    1000 Experience Points1 year registered
    Joey37's Avatar
    Join Date
    11-06-18
    Location
    Coventry, Rhode Island
    Posts
    293
    Points
    3,275
    Level
    16
    Points: 3,275, Level: 16
    Level completed: 57%, Points required for next Level: 175
    Overall activity: 14.0%

    Y-DNA haplogroup
    R1a-YP445
    MtDNA haplogroup
    J1c2b

    Ethnic group
    Celto-Germanic
    Country: USA - Rhode Island



    1 out of 1 members found this post helpful.
    Also they do a lousy job in detecting Low German; by simply doing math via (cyber) paper trail, I determined all of my 15% Broadly Northwest European is my Low Saxon (both from the Netherlands and Germany) ancestry.

  3. #3
    Regular Member Achievements:
    Veteran10000 Experience Points

    Join Date
    12-03-14
    Posts
    481
    Points
    10,310
    Level
    30
    Points: 10,310, Level: 30
    Level completed: 60%, Points required for next Level: 240
    Overall activity: 37.0%


    Country: Italy



    Quote Originally Posted by Tomenable View Post
    This was originally posted on Anthrogenica by user Kurd (creator of some of GEDmatch calculators) in this thread:
    https://anthrogenica.com/showthread....lculators-more
    Quote:
    "23andMe's speculative mode greatly overestimates major components, and underestimates minor components. This is due to their methodology of snipping the genome into 100 SNP segments to compare against the limited references they have. So for example, if 60% of the segment indicates Middle Eastern, and 40% indicates South Asian, that segment is assigned 100% Middle Eastern. In effect 40% of the segment, which is South Asian is ignored, and the whole segment is assigned Middle-Eastern.
    Also, their methodology includes segment smoothing, which means if there are chunks of minor components in a segment, they are ignored.
    That is how Iranians and West Asians turn out 98-100% Middle Eastern, and folks in neighboring Pakistan turn out 98-100% South Asian in speculative mode.
    This naturally is unrealistic and uninformative, because you don't need a test to tell you that. Conservative mode is better with regards to inflation of major components and underestimation of minor components, but the trouble here is that people get 5-70% unassigned. This is where your minor components are hidden.
    The above translates to 23andMe being useless for figuring out your minor components to any degree of accuracy."
    I don't think doing that is wrong or right, but people should understand 23andMe shows recent and geographical (not deep or ethnic/"racial") ancestry.
    That's why they claim what they do about the timeframe (that their test goes only few centuries back, to Early Modern Era).
    Hmm... Perhaps 23andMe should include clusters even more informative, like ANE, WHG etc.? :)
    Seriously, I don't get. I certainly respect his point of view, but my opinion is different. Not sure what he wants exactly, if just knowing shared ancient ancestries rather than actual recent ancestry, if ancient direct influences... And how far back?
    Well, I really don't know details of 23andMe's methodology, but no, apparently that's not "why" they claim what they do about the timeframe. That's "because" of the established timeframe. In fact, they are being criticized here for doing what seems a very complex job, for doing what other commercial companies would like to do, and in fact try to do without such success. For example, if a company says a full Italian is 100% Italian, without compromise other clusters, pretty difficult thing, then this company must be great. Imo the supposed flaw is in fact a virtude. Additionally, commercial companies generally have to choose just one kind of calculator, not several, as we have in GedMatch.
    Kurd is great, and did good algorithms (I bought some; Geneplaza's, right?), but I wonder if he himself could do the same as 23andMe. 98-100%? Wow! This seems realistic and informative, because some people, especially mixed, "do need a test to tell you that", and in fact it's what they're hoping, right? Not sure "Oracles", for example, would do the job with more accuracy. Again, I wonder what exactly he wants. That 23andMe says you're 10% West Asian, 10% East Med and this kind of stuff? That it keeps the current clusters and makes them more flexible? Probably the last option. I don't have much knowledge on this, but I guess it would be even easier for 23andMe and any other. But come on! No problem with these more flexible approaches, but they"re certainly not the proposal of this kind of commercial tests.
    So, imo, 23andMe certainly is not perfect, but it seems right in its methodology. Now, what it could, should and probably will do, is adding more sub-regional clusters for Western Asian, South Asian etc.

    https://www.23andme.com/ancestry-composition-guide/

  4. #4
    Advisor Achievements:
    VeteranThree Friends50000 Experience PointsRecommendation Second Class
    Awards:
    Posting Award
    Angela's Avatar
    Join Date
    02-01-11
    Posts
    15,178
    Points
    269,504
    Level
    100
    Points: 269,504, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Overall activity: 99.6%


    Ethnic group
    Italian
    Country: USA - New York



    0 out of 1 members found this post helpful.
    Yes, well, while his Ancient Calculator was excellent, his other ones are horrible for Europeans, so... I am not either Bulgarian or Albanian.

    A lot of the other amateur calculators are horrible too. Why does Gedmatch still carry the J Test? It's a joke. For Italians, in particular, you'd need to include a LOT of reference samples to get decent fits.

    Plus, 23andme isn't in the business of telling you about your ancient ancestors. Neither is Ancestry, which is the other decent one. The rest are terrible. That isn't of interest to their customers.

    If you want to try to match yourself against ancient samples you can use the raw data.


    Non si fa il proprio dovere perchè qualcuno ci dica grazie, lo si fa per principio, per se stessi, per la propria dignità. Oriana Fallaci

  5. #5
    Regular Member Achievements:
    Tagger Second Class1000 Experience PointsVeteran

    Join Date
    09-04-15
    Posts
    64
    Points
    4,119
    Level
    18
    Points: 4,119, Level: 18
    Level completed: 68%, Points required for next Level: 131
    Overall activity: 0%


    Country: Portugal



    With all the problems that it has, it is still reliable if you are a third our forth generation immigrant to the US or Canada for instance.
    Though to me the major problem is actually classifying ancient admixture (what is from where), it isn't always that simple. For example, let's say that about the year 1000 there was a man from what is now Poland who went to what is now France, he had children who stayed in France and then went back to Poland and had more children there and for the sake of simplicity let's pretend that since that year there were no major demography changing events and that that man's DNA, somehow is still found in a discernible percentage in a descendant today.
    Would his DNA be classified has Polish or French, the easy answer is Polish, but if his descendants used for the calculator are French than it will be French. And even if you build a calculator using only admixture from DNA of people who were buried in a certain location in the year 1000, this problem can also arise.
    So, what I believe is that we can only build a truly reliable calculator if the samples are taken from a certain time-period and if it is said that the samples were from a certain location in that period of time.
    I hope I made myself understood, since even to me trying to explain it is pretty complex, still, I hope I did not change the subject matter of the forum too much.

  6. #6
    Advisor Achievements:
    VeteranThree Friends50000 Experience PointsRecommendation Second Class
    Awards:
    Posting Award
    Angela's Avatar
    Join Date
    02-01-11
    Posts
    15,178
    Points
    269,504
    Level
    100
    Points: 269,504, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Overall activity: 99.6%


    Ethnic group
    Italian
    Country: USA - New York



    0 out of 1 members found this post helpful.
    I just saw a post by Razib Khan saying he's impressed by the sub-regional break down 23andme produces. So am I.

    They got it exactly right for me without my ever telling them anything about my ancestry. Well, they don't show the eastern Ligurian, but I've come to the conclusion that eastern Ligurians are a mix of western Ligurians, Emilians and Tuscans. That would explain why Emilia is the closest match, but undoubtedly still "south" of my father's score.



    As I said upthread, if you want "ancient" ancestry, get the raw data and use other tools.

    Hold on, they've gotten really specific. There are dark blue intrusions into the eastern Ligurian Alps, what looks like that border area where Lombardia, Piemonte, and Liguria meet (they speak Ligurian), and from which we have a reference sample, and what may be Sarzana and La Spezia, which are eastern Liguria.

    Is the southern part of Tuscany not included in the dark blue and is instead shoved into Lazio??? That would also make sense.

    Well done, well done! Excellent!

  7. #7
    Regular Member Achievements:
    1 year registered5000 Experience Points
    TardisBlue's Avatar
    Join Date
    11-02-17
    Posts
    104
    Points
    5,419
    Level
    21
    Points: 5,419, Level: 21
    Level completed: 74%, Points required for next Level: 131
    Overall activity: 3.0%

    MtDNA haplogroup
    W3a1

    Ethnic group
    Cheesy macaroni
    Country: France



    My regional breakdown is spot on for Italy… BUT, I have provided them with genealogical information. As far as my French goes, they completely failed to pick it up, as I've only got 3.8% undefined French and German, and no regional breakdown. The latest update (Beta version) has however increased my F&G to 9.8%, decreased my Italian to 49.5% (16% decrease) while increasing my Balkan and Greek by 1 point. I'm pretty sure that if I have my Mom tested and her results phased against mine, my results will be more accurate and the F&G will go up (hopefully with a regional breakdown).

  8. #8
    Regular Member Achievements:
    Veteran10000 Experience Points

    Join Date
    12-03-14
    Posts
    481
    Points
    10,310
    Level
    30
    Points: 10,310, Level: 30
    Level completed: 60%, Points required for next Level: 240
    Overall activity: 37.0%


    Country: Italy



    1 out of 1 members found this post helpful.
    Quote Originally Posted by TardisBlue View Post
    My regional breakdown is spot on for Italy… BUT, I have provided them with genealogical information. As far as my French goes, they completely failed to pick it up, as I've only got 3.8% undefined French and German, and no regional breakdown. The latest update (Beta version) has however increased my F&G to 9.8%, decreased my Italian to 49.5% (16% decrease) while increasing my Balkan and Greek by 1 point. I'm pretty sure that if I have my Mom tested and her results phased against mine, my results will be more accurate and the F&G will go up (hopefully with a regional breakdown).
    Apparently trio-phasing does do a big difference, increasing mainly the recall, I'd guess. The higher the level (sub-regional -> regional -> continental) the higher the precision and recall. French&German, btw, is a cluster with low recall: just 20%. Meaning a good chunck of it must be "hidden" as Broadly Northern European, which is not necessarily an imprecision. See, you got 9.8%, but it doesn't mean they are assigning wrongly most of the other actual F&G %. A low recall generally is somewhat balanced by the precision of the "competing" clusters. So, it's like these other F&G %s are liable to additional controlls - so to speak -, related to the precision of the other clusters, generally high, and they would tend to be assigned, in your case, to the Broadly Northern European, which in turn has a good precision "and" recall. F&G is really a difficult cluster anyway. Not sure they'll manage to "solve" its low recall.

    That said, there is still a margin of error, naturally, not just as suggested by the very numbers provided by 23andMe, but also because some countries are just complicated. I doubt their references cover all Italy, for example, which would mean some results don't fit in their statistics - not based on "all" kind of Italians. Trio-phasing made my results almost perfect. However, my mother is still getting too much Northern % (31.9% of F&G + 1.3% B&I + 5.4% Broadly NW. E.). Yes, it may be shared ancestry, but it doesn't matter, given the purpose of Ancestry Composition. She is Italian in ancestry, certainly for more than 500 years, and 23andMe statistics don't fit in her results. Nor father's (but his results are not that off as my mother's). Now, I wanna know how they will solve that without compromise other clusters.
    Anyway, this likely don't apply to France (the country) at the same level, then I guess the related precision and recall are more realistic here.
    Still, I believe Ancestry Composition is the best. Besides, there is this tool showing which regions likely match etc. It compensates it a bit. My mother is well placed in this tool.

    @Angela
    Sorry! I got more Italian than you in new version: 85%. ;)

  9. #9
    Advisor Achievements:
    VeteranThree Friends50000 Experience PointsRecommendation Second Class
    Awards:
    Posting Award
    Angela's Avatar
    Join Date
    02-01-11
    Posts
    15,178
    Points
    269,504
    Level
    100
    Points: 269,504, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Overall activity: 99.6%


    Ethnic group
    Italian
    Country: USA - New York



    0 out of 1 members found this post helpful.
    Quote Originally Posted by Regio X View Post
    Apparently trio-phasing does do a big difference, increasing mainly the recall, I'd guess. The higher the level (sub-regional -> regional -> continental) the higher the precision and recall. French&German, btw, is a cluster with low recall: just 20%. Meaning a good chunck of it must be "hidden" as Broadly Northern European, which is not necessarily an imprecision. See, you got 9.8%, but it doesn't mean they are assigning wrongly most of the other actual F&G %. A low recall generally is somewhat balanced by the precision of the "competing" clusters. So, it's like these other F&G %s are liable to additional controlls - so to speak -, related to the precision of the other clusters, generally high, and they would tend to be assigned, in your case, to the Broadly Northern European, which in turn has a good precision "and" recall. F&G is really a difficult cluster anyway. Not sure they'll manage to "solve" its low recall.

    That said, there is still a margin of error, naturally, not just as suggested by the very numbers provided by 23andMe, but also because some countries are just complicated. I doubt their references cover all Italy, for example, which would mean some results don't fit in their statistics - not based on "all" kind of Italians. Trio-phasing made my results almost perfect. However, my mother is still getting too much Northern % (31.9% of F&G + 1.3% B&I + 5.4% Broadly NW. E.). Yes, it may be shared ancestry, but it doesn't matter, given the purpose of Ancestry Composition. She is Italian in ancestry, certainly for more than 500 years, and 23andMe statistics don't fit in her results. Nor father's (but his results are not that off as my mother's). Now, I wanna know how they will solve that without compromise other clusters.
    Anyway, this likely don't apply to France (the country) at the same level, then I guess the related precision and recall are more realistic here.
    Still, I believe Ancestry Composition is the best. Besides, there is this tool showing which regions likely match etc. It compensates it a bit. My mother is well placed in this tool.

    @Angela
    Sorry! I got more Italian than you in new version: 85%. ;)
    They clearly made a terrible error. :)



    That was all on the v3 chip, so a lot more snps, I think.

    This is the "Italian" map:


    This is v4/v5: they use less snps. I don't know if they're perhaps "better" snps or not, but it does result in slightly different results.


    The Italian map:


    Obviously, since the older version says I'm more Italian, that's the correct one. :)

    I wonder if when they run the reference samples of individuals they use only the ones on the chip you have. That might make a difference.

  10. #10
    Elite member Achievements:
    Three FriendsRecommendation Second ClassVeteran50000 Experience Points

    Join Date
    07-09-14
    Posts
    4,528
    Points
    61,213
    Level
    76
    Points: 61,213, Level: 76
    Level completed: 73%, Points required for next Level: 437
    Overall activity: 20.0%

    Y-DNA haplogroup
    R1b
    MtDNA haplogroup
    W6

    Ethnic group
    Polish
    Country: Poland



    1 out of 1 members found this post helpful.
    Well, as for these recently introduced 23andMe's sub-regional matches - I was asking Living DNA customer service about my updated Living DNA results, and I asked them also what they think of recent 23andMe updates.

    They told me:

    Quote: "(...) Regarding your question about 23andMe - their regions are not DNA based. They are records/cousin matching based. If you share some distant DNA with people living in a particular region, they add that region as a part of your breakdown. So comparing the number of regions offered by different DTC companies became more difficult, because we are comparing two different things. (...)" - wrote a person from Living DNA team.

    So this new "recent ancestors feature" of 23andMe is generally based on where your matches are from. It is not based on ancestry-informative markers (ancestry-informative SNPs) in your own DNA, but on segments that you share with other people.

    In other words, if you use "GEDCOM + DNA matches" feature on GEDmatch, and go through family trees of all of your matches, you can figure out something very similar (which regions are your closest matches from). Of course it would take a lot of time, and 23andMe does it for you, so kudos to them for this. I just wonder if they check birthplaces of great-grandparents of your matches, or report this based on current places of residence of your matches alone.

    I've ordered 23andMe and waiting for my results, we'll see which Polish regions they will report for me.
    Last edited by Tomenable; 16-04-19 at 09:36.

  11. #11
    Regular Member Achievements:
    Veteran10000 Experience Points

    Join Date
    12-03-14
    Posts
    481
    Points
    10,310
    Level
    30
    Points: 10,310, Level: 30
    Level completed: 60%, Points required for next Level: 240
    Overall activity: 37.0%


    Country: Italy



    Quote Originally Posted by Angela View Post
    They clearly made a terrible error. :)



    That was all on the v3 chip, so a lot more snps, I think.

    This is the "Italian" map:


    This is v4/v5: they use less snps. I don't know if they're perhaps "better" snps or not, but it does result in slightly different results.


    The Italian map:


    Obviously, since the older version says I'm more Italian, that's the correct one. :)

    I wonder if when they run the reference samples of individuals they use only the ones on the chip you have. That might make a difference.
    This is controversial. :)

    The maps are really cool. They place us perfectly in Veneto, and the others I've seen, including yours, are generally right, or almost right. Pretty intelligent tool!

    We tested v4. The 23andMe results changed few time ago. Mine and my father's got much better, while my mother's are still close to the old one, as yours, apparently.
    If you check the numbers provided by 23andMe on its guide, you'll see that yours (similar to my father's), as my mother's, don't fit. In an ideal situation, we should be categorized as Italians, Broadly Southern Europeans and Broadly Europeans. That said, the precision of Northern European cluster is 96%, supposedly. So, in theory, it would mean, according to this last update, that you're almost surely 19% N. European, while my mother woud be ~37% at least. There must be a margin, of course; still, these are not exactly low numbers, then the probability of a good chunck of them is correct should tend to 100%, if the statistics were right. This is the way I see it.
    So, we know this is not true, 'cause our ancestries are actually full Italian, meaning that the statistics of precision/recall are not that accurate, at least when it comes to Italy, a country with a relatively high genetic diversity. Maybe they used as references Italians from some relatively few areas?
    Or perhaps the precision and recall %s are valid just for the conservative mode? I don't even know if it still exists. If so, why doesn't 23andMe specify it?
    Yet, they inform in the timeline my mother has certain "foreign" ancestry in the last 2-4 generations, for example. Wrong.

    From the guide:
    "Precision answers the question 'When the system predicts that a piece of DNA comes from population A, how often is the DNA actually from population A?' Recall answers the question 'Of the pieces of DNA that actually are from population A, how often does the system correctly predict that they are from population A?'"

    I just checked. They say they use 606 reference individuals for Italian cluster. Plus:
    "The reference datasets are made up of individuals from publicly available datasets including the Human Genome Diversity Project , HapMap ,and the 1000 Genomes project , as well as individuals from private 23andMe data collections and a large number of 23andMe customers who have consented to participate in research. In total, there are 10,419 research-consented customers and 2,612 non-customers in these population reference datasets."

    Apparently, lot of people. I'm confused. :)

    @Tomenable
    So they use IBD, indirectly.

  12. #12
    Advisor Achievements:
    VeteranThree Friends50000 Experience PointsRecommendation Second Class
    Awards:
    Posting Award
    Angela's Avatar
    Join Date
    02-01-11
    Posts
    15,178
    Points
    269,504
    Level
    100
    Points: 269,504, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Overall activity: 99.6%


    Ethnic group
    Italian
    Country: USA - New York



    0 out of 1 members found this post helpful.
    Quote Originally Posted by Regio X View Post
    This is controversial. :)

    The maps are really cool. They place us perfectly in Veneto, and the others I've seen, including yours, are generally right, or almost right. Pretty intelligent tool!

    We tested v4. The 23andMe results changed few time ago. Mine and my father's got much better, while my mother's are still close to the old one, as yours, apparently.
    If you check the numbers provided by 23andMe on its guide, you'll see that yours (similar to my father's), as my mother's, don't fit. In an ideal situation, we should be categorized as Italians, Broadly Southern Europeans and Broadly Europeans. That said, the precision of Northern European cluster is 96%, supposedly. So, in theory, it would mean, according to this last update, that you're almost surely 19% N. European, while my mother woud be ~37% at least. There must be a margin, of course; still, these are not exactly low numbers, then the probability of a good chunck of them is correct should tend to 100%, if the statistics were right. This is the way I see it.
    So, we know this is not true, 'cause our ancestries are actually full Italian, meaning that the statistics of precision/recall are not that accurate, at least when it comes to Italy, a country with a relatively high genetic diversity. Maybe they used as references Italians from some relatively few areas?
    Or perhaps the precision and recall %s are valid just for the conservative mode? I don't even know if it still exists. If so, why doesn't 23andMe specify it?
    Yet, they inform in the timeline my mother has certain "foreign" ancestry in the last 2-4 generations, for example. Wrong.

    From the guide:
    "Precision answers the question 'When the system predicts that a piece of DNA comes from population A, how often is the DNA actually from population A?' Recall answers the question 'Of the pieces of DNA that actually are from population A, how often does the system correctly predict that they are from population A?'"

    I just checked. They say they use 606 reference individuals for Italian cluster. Plus:
    "The reference datasets are made up of individuals from publicly available datasets including the Human Genome Diversity Project , HapMap ,and the 1000 Genomes project , as well as individuals from private 23andMe data collections and a large number of 23andMe customers who have consented to participate in research. In total, there are 10,419 research-consented customers and 2,612 non-customers in these population reference datasets."

    Apparently, lot of people. I'm confused. :)

    @Tomenable
    So they use IBD, indirectly.
    It may be a lot of people, but it's all reative. I would bet they have tens of thousands of people with different types of Northern European ancestry. Plus, they're very homogeneous.

    I don't think they have a lot for certain areas of Italy, especially in the north. Italian-Americans do test, but the vast majority are from southern Italy and Sicily.

    Maybe they have quite a few from your Dad's area of the Veneto but not from your mother's. If some obsessed person tests twenty relatives, you're going to get a lot of similarity.

    No one in my area(s) seems to be at all interested, so I'm not surprised they can't assign some of me as "Italian". After all this time I only know a handful of people from eastern Liguria who have tested, and one or two from the Emilian Appennines. A lot of my "Italian" matches are actually Argentines from La Spezia, Genova and Parma who were part of a medical study.

    On the "relatives" feature where they do indeed use IBD analysis, my only Italian matches are from Emilia, Toscana, and Liguria, (usually partial ancestry at that), and the rest are non-Italian. In fact, they tell me that 80% of my "matches" report "French-German" ancestry. I've had two families with Danish ancestry since the beginning, and it's clearly not recent.


    The number of snps on the chip is another factor. I'm 83% Italian on the V2/V3 chip which has a lot more snps.

  13. #13
    Regular Member Achievements:
    1 year registered5000 Experience Points
    TardisBlue's Avatar
    Join Date
    11-02-17
    Posts
    104
    Points
    5,419
    Level
    21
    Points: 5,419, Level: 21
    Level completed: 74%, Points required for next Level: 131
    Overall activity: 3.0%

    MtDNA haplogroup
    W3a1

    Ethnic group
    Cheesy macaroni
    Country: France



    Quote Originally Posted by Regio X View Post
    Apparently trio-phasing does do a big difference, increasing mainly the recall, I'd guess. The higher the level (sub-regional -> regional -> continental) the higher the precision and recall. French&German, btw, is a cluster with low recall: just 20%. Meaning a good chunck of it must be "hidden" as Broadly Northern European, which is not necessarily an imprecision. See, you got 9.8%, but it doesn't mean they are assigning wrongly most of the other actual F&G %. A low recall generally is somewhat balanced by the precision of the "competing" clusters. So, it's like these other F&G %s are liable to additional controlls - so to speak -, related to the precision of the other clusters, generally high, and they would tend to be assigned, in your case, to the Broadly Northern European, which in turn has a good precision "and" recall. F&G is really a difficult cluster anyway. Not sure they'll manage to "solve" its low recall.

    That said, there is still a margin of error, naturally, not just as suggested by the very numbers provided by 23andMe, but also because some countries are just complicated. I doubt their references cover all Italy, for example, which would mean some results don't fit in their statistics - not based on "all" kind of Italians. Trio-phasing made my results almost perfect. However, my mother is still getting too much Northern % (31.9% of F&G + 1.3% B&I + 5.4% Broadly NW. E.). Yes, it may be shared ancestry, but it doesn't matter, given the purpose of Ancestry Composition. She is Italian in ancestry, certainly for more than 500 years, and 23andMe statistics don't fit in her results. Nor father's (but his results are not that off as my mother's). Now, I wanna know how they will solve that without compromise other clusters.
    Anyway, this likely don't apply to France (the country) at the same level, then I guess the related precision and recall are more realistic here.
    Still, I believe Ancestry Composition is the best. Besides, there is this tool showing which regions likely match etc. It compensates it a bit. My mother is well placed in this tool.
    Thanks for your explanations Regio X, I don't know much at all about DNA testing and the knowledge of members here is impressive. What puzzles me is my low Broadly NW European, only 8.6%. If you add the Broadly European and F&G, that only makes 23.3%. If we include the British and Irish, that makes 27.3%. I also have 2.1% unassigned. The rest is Italian / Greek Balkan / 1.3% Spanish and Portuguese, and Broadly Southern Euro (15.5%), with a tiny bit of W. Asia and N. Africa. So all in all, my 23andme results seem to be more south shifted - for now.

  14. #14
    Regular Member Achievements:
    Veteran10000 Experience Points

    Join Date
    12-03-14
    Posts
    481
    Points
    10,310
    Level
    30
    Points: 10,310, Level: 30
    Level completed: 60%, Points required for next Level: 240
    Overall activity: 37.0%


    Country: Italy



    Quote Originally Posted by Angela View Post
    It may be a lot of people, but it's all reative. I would bet they have tens of thousands of people with different types of Northern European ancestry. Plus, they're very homogeneous.

    I don't think they have a lot for certain areas of Italy, especially in the north. Italian-Americans do test, but the vast majority are from southern Italy and Sicily.

    Maybe they have quite a few from your Dad's area of the Veneto but not from your mother's. If some obsessed person tests twenty relatives, you're going to get a lot of similarity.

    No one in my area(s) seems to be at all interested, so I'm not surprised they can't assign some of me as "Italian". After all this time I only know a handful of people from eastern Liguria who have tested, and one or two from the Emilian Appennines. A lot of my "Italian" matches are actually Argentines from La Spezia, Genova and Parma who were part of a medical study.

    On the "relatives" feature where they do indeed use IBD analysis, my only Italian matches are from Emilia, Toscana, and Liguria, (usually partial ancestry at that), and the rest are non-Italian. In fact, they tell me that 80% of my "matches" report "French-German" ancestry. I've had two families with Danish ancestry since the beginning, and it's clearly not recent.


    The number of snps on the chip is another factor. I'm 83% Italian on the V2/V3 chip which has a lot more snps.
    Yeah, you must be right. Most of testers with Italian ancestry are Americans, so the supposed lack of references may explain their statistics, built just over what they do have in hand, naturally. Anyway, trio-phasing really works, and my results are suggestive that a good chunck of my mother's N. European could be, yes, categorized as Italian, and in fact is Italian. Indeed, according to 23andMe itself, all what I inherited from her are Italian. 50%! Very interesting!

  15. #15
    Regular Member Achievements:
    Veteran10000 Experience Points

    Join Date
    12-03-14
    Posts
    481
    Points
    10,310
    Level
    30
    Points: 10,310, Level: 30
    Level completed: 60%, Points required for next Level: 240
    Overall activity: 37.0%


    Country: Italy



    Quote Originally Posted by TardisBlue View Post
    Thanks for your explanations Regio X, I don't know much at all about DNA testing and the knowledge of members here is impressive. What puzzles me is my low Broadly NW European, only 8.6%. If you add the Broadly European and F&G, that only makes 23.3%. If we include the British and Irish, that makes 27.3%. I also have 2.1% unassigned. The rest is Italian / Greek Balkan / 1.3% Spanish and Portuguese, and Broadly Southern Euro (15.5%), with a tiny bit of W. Asia and N. Africa. So all in all, my 23andme results seem to be more south shifted - for now.
    And what is your actual ancestry/admixture? And which region(s) of France?

  16. #16
    Regular Member Achievements:
    1 year registered5000 Experience Points
    TardisBlue's Avatar
    Join Date
    11-02-17
    Posts
    104
    Points
    5,419
    Level
    21
    Points: 5,419, Level: 21
    Level completed: 74%, Points required for next Level: 131
    Overall activity: 3.0%

    MtDNA haplogroup
    W3a1

    Ethnic group
    Cheesy macaroni
    Country: France



    Quote Originally Posted by Regio X View Post
    And what is your actual ancestry/admixture? And which region(s) of France?
    Actual ancestry is 50% Campania on the paternal side. Maternal side: 37.5% Champagne-Ardennes (NE France) + 12.5% Haute-Savoie (not far from the Italian border so that could be interpreted as Italian, though I don't know how much DNA I may have inherited from my ggfather). Ftdna, on the other hand, gives me 55% W. and Central Europe (and only 17% SE Europe + 19% Middle-Eastern), that's why I'm puzzled with my 23andme results. I'm sure phasing would "fix" it though so I'm considering having my mom tested at some point (she tested with MyHeritage. I uploaded her data to Gedmatch, where she comes up as mostly French or Dutch).

  17. #17
    Advisor Achievements:
    VeteranThree Friends50000 Experience PointsRecommendation Second Class
    Awards:
    Posting Award
    Angela's Avatar
    Join Date
    02-01-11
    Posts
    15,178
    Points
    269,504
    Level
    100
    Points: 269,504, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Overall activity: 99.6%


    Ethnic group
    Italian
    Country: USA - New York



    0 out of 1 members found this post helpful.
    Fwiw, I get the same similarity number for Savoy as for Lombardy, so they're probably very "Italian" like. That would bring you up to 72.5%, which is pretty close to what you get, isn't it? Wasn't it 75% Italian?

    Ed. 62.5%
    Last edited by Angela; 13-04-19 at 23:44.

  18. #18
    Regular Member Achievements:
    1 year registered5000 Experience Points
    TardisBlue's Avatar
    Join Date
    11-02-17
    Posts
    104
    Points
    5,419
    Level
    21
    Points: 5,419, Level: 21
    Level completed: 74%, Points required for next Level: 131
    Overall activity: 3.0%

    MtDNA haplogroup
    W3a1

    Ethnic group
    Cheesy macaroni
    Country: France



    Quote Originally Posted by Angela View Post
    Fwiw, I get the same similarity number for Savoy as for Lombardy, so they're probably very "Italian" like. That would bring you up to 72.5%, which is pretty close to what you get, isn't it? Wasn't it 75% Italian?
    Not that much I got 66% with the normal version, and the latest Beta version brings it down to 49.6%, which does reflect my actual ancestry. However if you add the Broadly Southern European plus the Greek and Balkan, it's still "too much" compared to my actual ancestry. Not that I mind though It just puzzles me.

  19. #19
    Regular Member Achievements:
    1000 Experience Points1 year registered

    Join Date
    16-10-17
    Posts
    206
    Points
    2,609
    Level
    14
    Points: 2,609, Level: 14
    Level completed: 53%, Points required for next Level: 141
    Overall activity: 0%

    Y-DNA haplogroup
    H2a1 M9313
    MtDNA haplogroup
    H1c3b

    Country: UK - England



    I had a slight change to my 23 & Me, due to the Up-date I think, my results are now showing 100% European, ( same ) but now 98.3% Northwest European.

    The rest 1.7% is split between Southern European 1.2%, ( Spanish & Portuguese 0.02%, and Broadly Southern European 1.0% ) and Broadly European 0.5%.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •