23andMe Questions about 23andme

Angela · Apr 29, 2016

I thought I'd start a general thread for questions about how 23andme works, since people do have general questions about the algorithm and how to interpret results.

I'll just start with one important point that still causes some confusion. The accuracy of the results is dependent on the coverage in terms of number of samples for any given area.

If person X has all ancestors from region Y for five hundred years, and there were lots and lots of representative samples from region Y, 23andme would be able to tell person X that he or she is 100% typical of that region.

Unfortunately, 23andme doesn't have anywhere near the number of samples from most areas for that to work accurately.

I'll use as an example a sample from La Spezia, Italy, a Ligurian city.

This person's family is documented in the area since the middle 1500s. By any common sense definition, this person is "Italian". Yet, these are the results from 23andme:

Why is the "Italian" score so low? The reason is that there are an infinitesimal number of northern Italians in the 23andme "Italian" reference population. Most of the samples used are from Italian-Americans whose ancestors came from southern Italy and Sicily. The next largest group is from Tuscany, because of the many academic studies done there, including 1000 genomes. This Spezzino, coming from an area close to Toscana, will share alleles with them, and so shows up as more "Italian" than someone from Friuli, for example, who might only score 35% "Italian"

For northern Italians, all they have are the 8 samples from Bergamo and the handful of private testees.

If there were hundreds and hundreds of samples from each area the total for "Italian" would go way up.

This applies even to southern Italians, I think, and their supposed additional "Middle Eastern" (actually Caucasus) percentages. If there are a lot of samples from a given town, the algorithm has more of a chance to recognize the next person to test from that area as having a very high "Italian" percentage. However, that isn't always the case. The total number of Italians who has tested is smaller by a couple of orders of magnitude than the number of people of British descent who have tested, for example. So some people won't match perfectly.

This has application when comparing southern Italians and Greeks as well. As a result of the way that 23andme clusters different groups it obscures more ancient relationships. Every other academic genetic analysis I can recall finds very similar "Caucasus" and "West Asian" in mainland Greeks as in southern Italians, yet that doesn't appear to be the case in 23andme. The genetics hasn't changed; it's just an artifact of the 23andme method.

People lose sight of what service 23andme is trying to provide. It's attempting to tell people where the majority of their ancestors lived in the last 500 years. It isn't, like academic studies, trying to trace the population history of each region over the last 3000 or 5000 years. Even for its stated goal, it isn't producing a totally accurate picture, partly because it doesn't have enough samples from certain areas, and partly because I don't think that's ever been their primary focus, and it's less important to them by the day. If they really cared about this, they'd at least include all the samples from all the academic papers that have been done. They haven't and they won't. No one should even expect an update of AC based on all the new samples they get.

The other problem is that some people attempt to use their results for one agenda driven purpose or another without really understanding how the algorithm works.

This is part of the reason why there's a push back about these consumer ancestry tests. Some companies mislead (which I don't think 23andme has done). In other cases, people just don't understand the limitations of this kind of testing, or are just deliberately misusing the results.

Anyway, that's my two cents.

Goga · Apr 29, 2016

I did a test at 23andme because I wanted to know my haplogroups and they are relatively a cheap company.

I was not really interested in the regions, because I know where my ancestors were from, and where my native homeland is, because I'm pure, and all my ancestors were 'pure' (my religion doesn't mix with other religions and we don't accept converts).

What I do really think is interesting about 23andme is the information they give about your health.

I'm a satisfied consumer. Because they provided me everything what I wanted to know and even more about my health!

Angela · Apr 29, 2016

Goga said:
I did a test at 23andme because I wanted to know my haplogroups and they are relatively a cheap company.

I was not really interested in the regions, because I know where my ancestors were from, and where my native homeland is, because I'm pure, and all my ancestors were 'pure' (my religion doesn't mix with other religions and we don't accept converts).

What I do really think is interesting about 23andme is the information they give about your health.

I'm a satisfied consumer. Because they provided me everything what I wanted to know and even more about my health!

Exactly. If you want to know where your ancestors lived, follow the paper trail. It's a lot harder to do than taking a test, but it's more accurate.

As to no conversion, are you sure about that? I personally know people who have married Greek Orthodox Christians and have converted.

Goga · Apr 29, 2016

Angela said:
Exactly. If you want to know where your ancestors lived, follow the paper trail. It's a lot harder to do than taking a test, but it's more accurate.

As to no conversion, are you sure about that? I personally know people who have married Greek Orthodox Christians and have converted.

Yeah, if you know what is your ethnicity, and that of your both parents, and where your native language is from, more you don't have to know. That knowledge is much more accurate than any DNA test.

It is true that I was baptized in an Orthodox church. Because I was born in the Caucasus.

But I don't consider myself as Christian. I was born as an ethnic Yezidi. Yezidi are Kurds who are still following our ancient Iranic religion and are not converted into Islam. It is not forbidden according to our religion to be baptized in a church as long you don't abandon your native Ezdi faith.

I'm an Ezdi Kurd, because my both parents are Ezdi Kurds. Same with my parents. If 1 of your parents is NOT Ezdi, you can't be Ezdi anymore. (Jews think only mother is important, while Ezdi think both are important)

And it is almost impossible for a non-Ezdi outsider (Muslim or Christian) to become Yezidi. So my people don't accept conversions.

Pax Augusta · Apr 30, 2016

Angela said:
Why is the "Italian" score so low? The reason is that there are an infinitesimal number of northern Italians in the 23andme "Italian" reference population. Most of the samples used are from Italian-Americans whose ancestors came from southern Italy and Sicily. The next largest group is from Tuscany, because of the many academic studies done there, including 1000 genomes. This Spezzino, coming from an area close to Toscana, will share alleles with them, and so shows up as more "Italian" than someone from Friuli, for example, who might only score 35% "Italian"

.

Angela, very interesting and useful post. Thanks!

On the accuracy of 1000 genomes I have some doubts. 1000 genomes uses the HapMap Project samples. Reading the HapMap description on Tuscans - Toscani in Italia (TSI), we learn some interesting things.

1)Brief Description: These samples were collected from unrelated individuals in a particular town in Tuscany, Italy. They do not necessarily represent all Tuscans, nor all Italians, whose population history is complex. The samples should not be described merely as "Italian", "Southern European," "European" or "Caucasian" since each of those designators encompasses many populations with many different geographic ancestries.

2) At least three out of four grandparents were born in Tuscany.

Pretty interesting. So according to the geneticists who collected the Tuscan samples "these samples do not necessarily represent all Tuscans, nor all Italians, whose population history is complex". So why 23andme, the other commercial companies and the amateur calculators are using them? Last but not least, three out of four grandparents were born in Tuscany. Why only 3 out of 4? I expect at least 4. I mean, we are talking of grandparents not of great-great-grandparents. Note that it just says "born in Tuscany", it doesn't say "of fully Tuscan ancestry".

Angela said:
For northern Italians, all they have are the 8 samples from Bergamo and the handful of private testees.

If there were hundreds and hundreds of samples from each area the total for "Italian" would go way up.

Are you referring to HGDP samples for Bergamo? Right, they are few people, I think 12. While 8 samples are the HGDP Tuscans.

Angela said:
People lose sight of what service 23andme is trying to provide. It's attempting to tell people where the majority of their ancestors lived in the last 500 years. It isn't, like academic studies, trying to trace the population history of each region over the last 3000 or 5000 years. Even for its stated goal, it isn't producing a totally accurate picture, partly because it doesn't have enough samples from certain areas, and partly because I don't think that's ever been their primary focus, and it's less important to them by the day. If they really cared about this, they'd at least include all the samples from all the academic papers that have been done. They haven't and they won't. No one should even expect an update of AC based on all the new samples they get.

The other problem is that some people attempt to use their results for one agenda driven purpose or another without really understanding how the algorithm works.

This is part of the reason why there's a push back about these consumer ancestry tests. Some companies mislead (which I don't think 23andme has done). In other cases, people just don't understand the limitations of this kind of testing, or are just deliberately misusing the results.

I completely agree with you.

Angela · Apr 30, 2016

Pax Augusta said:
.

Angela, very interesting and useful post. Thanks!

On the accuracy of 1000 genomes I have some doubts. 1000 genomes uses the HapMap Project samples. Reading the HapMap description on Tuscans - Toscani in Italia (TSI), we learn some interesting things.

1)Brief Description: These samples were collected from unrelated individuals in a particular town in Tuscany, Italy. They do not necessarily represent all Tuscans, nor all Italians, whose population history is complex. The samples should not be described merely as "Italian", "Southern European," "European" or "Caucasian" since each of those designators encompasses many populations with many different geographic ancestries.

2) At least three out of four grandparents were born in Tuscany.

Pretty interesting. So according to the geneticists who collected the Tuscan samples "these samples do not necessarily represent all Tuscans, nor all Italians, whose population history is complex". So why 23andme, the other commercial companies and the amateur calculators are using them? Last but not least, three out of four grandparents were born in Tuscany. Why only 3 out of 4? I expect at least 4. I mean, we are talking of grandparents not of great-great-grandparents. Note that it just says "born in Tuscany", it doesn't say "of fully Tuscan ancestry".

It's true that 1000 Genomes labels their sample TSI, but it's a different set of samples from those in HAP MAP. There are over 100 Tuscan samples in the 1000 Genome data bank.

If you go to this site and then click on spreadsheet, you can see them by scrolling down to TSI.
http://www.1000genomes.org/data

That's a lot for an autosomal sample. So far as I know they used the standard four grandparent rule, which I would agree presents some problems in the more northern parts of Italy because of internal immigration. However, that's the academic standard.

I do know that a set of samples absolutely guaranteed to be attested in a certain area of Italy for 500 years exists. Those are the samples collected in my father's Parma Valley by Cavalli-Sforza. I can't imagine that they destroyed them. I don't know why the geneticists aren't using them.

Hauteville · Apr 30, 2016

Thanks Angela very useful and exhaustive post.

Pax Augusta · Apr 30, 2016

Angela said:
It's true that 1000 Genomes labels their sample TSI, but it's a different set of samples from those in HAP MAP. There are over 100 Tuscan samples in the 1000 Genome data bank.

The Hap Map Tuscan samples are over 100 (117), this number corresponds to that of 1000 Genomes Project. Surely 1000 Genomes Project uses for the Tuscans the Hap Map Tuscan samples (as confirmed the samples start with NA), it's not clear if 1000 Genomes Project added other samples, I didn't find anything that shows that. But according to the 1000 Genomes Project the TSI (Toscani in Italy) samples come from Hap Map (Hap Map 3 population, the third phase of the International HapMap projec).

Source: http://www.1000genomes.org/cell-lines-and-dna-coriell

1000 Genomes Project admitted to share samples with the HapMap project.

The 1000 Genomes Project shares some samples with the HapMap project; any sample which starts with NA was likely part of the HapMap project. In the pilot stages of the project HapMap genotypes were also used to help quality control the data and identify sample swaps and contamination. Since phase 1 the HapMap data has not been used by the 1000 Genomes Project, and all genotypes were independantly identified by 1000 Genomes.

The majority of HapMap SNPs are found in the 1000 Genomes Project, there will be a small number of sites we fail to find using next generation sequencing but most sites from HapMap which aren’t found by the 1000 Genomes Project will be false discoveries by HapMap.

http://www.1000genomes.org/category/hapmap/

Angela said:
That's a lot for an autosomal sample. So far as I know they used the standard four grandparent rule, which I would agree presents some problems in the more northern parts of Italy because of internal immigration. However, that's the academic standard..

As claimed by itself Hap Map didn't use the standard four grandparent rule.

Angela · Apr 30, 2016

Another common error, in my opinion, is the attempt to use modern clusters of populations to draw conclusions about ancient gene flow, like proposing that the "Balkan" percentage in Italians, for example, could tell you the amount of ancient "Greek" input into Sicily, as just one example.

Who says that the Greeks who established colonies in Sicily were like the modern inhabitants of the Balkans? We don't know what the ancient Greeks of the mainland (from which most of the colonists came) in the first millennium BC were like. Nor what the populations of the southern part of Italy of that time were like for that matter. So how could we possibly estimate the amount of input then and what traces remain today? Only ancient dna will give us some clues.

How could anyone think that a "Balkan" percentage on 23andme based on the current residents of the entire Balkan region can tell us that? Balkan isn't just Greek. It's how 23andme has chosen to cluster a number of modern populations, and it's an extremely problematic cluster at that.

Here it is for those who seem to have forgotten what populations are included:
View attachment 7703

For clarity, Croatians are indeed included in the Balkan reference set. DNA.land is different. There they use Greece, Albania and Bulgaria. *

The inclusion of the Maltese is beyond me. They don't belong in this group. This helps to explain why modern people from these countries are going to get less "Near Eastern" than southern Italians whatever the reality of the situation. It's because it's already part of the cluster.

Then there's the whole issue of the accuracy of the results. Look at the recall numbers for Balkan. They're not very high. So, how confident should consumers be that it's picking up all the Balkan in the first place?

https://www.23andme.com/ancestry_composition_guide/

For general info on how Ancestry composition works, also see:
https://customercare.23andme.com/hc...-Reference-populations-in-Ancestry-Compositio

As I said, that's why there's push back against these consumer tests. They're being used in ways never intended.

@Pax Augusta

Good to know. Thanks for clearing that up.

Pax Augusta · Apr 30, 2016

Angela said:
How could anyone think that a "Balkan" percentage on 23andme based on the current residents of the entire Balkan region can tell us that? Balkan isn't just Greek. It's how 23andme has chosen to cluster a number of modern populations, and it's an extremely problematic cluster at that.

Here it is for those who seem to have forgotten what populations are included:
View attachment 7703

For clarity, Croatians are indeed included in the Balkan reference set. Does no one check people's claims?

The attachment doesn't work "Invalid Attachment specified". Is it a pic?

Angela said:
@Pax Augusta

Good to know. Thanks for clearing that up.

Grazie a te!

Hauteville · Apr 30, 2016

Maltese in the Balkan cluster and Cypriot in the Middle Eastern cluster, doesn't make much sense.

Angela · Apr 30, 2016

Pax Augusta said:
The attachment doesn't work "Invalid Attachment specified". Is it a pic?

Grazie a te!

Pax, are you logged in? It works when I click on it.

You know what? I'll just upload the picture to tiny pics or something so people can see it even if not logged in.

Pax Augusta · Apr 30, 2016

Angela said:
Pax, are you logged in? It works when I click on it.

You know what? I'll just upload the picture to tiny pics or something so people can see it even if not logged in.

Yes, I logged in and it doesn't work. It's better if you post the pic using an image hosting site, thanks.

Angela · Apr 30, 2016

Angela said:
Pax, are you logged in? It works when I click on it.

You know what? I'll just upload the picture to tiny pics or something so people can see it even if not logged in.

This is the list of reference populations for "Balkan". Here it is:
http://postimg.org/image/e86ev28pd/

Greece, Romania, Bulgaria, Croatia, Bosnia and Herzogovina, Serbia, Macedonia, Albania, Montenegro, Malta.

Pax Augusta · Apr 30, 2016

Angela said:
This is the list of reference populations for "Balkan". Here it is:
http://postimg.org/image/e86ev28pd/

Greece, Romania, Bulgaria, Croatia, Bosnia and Herzogovina, Serbia, Macedonia, Albania, Montenegro, Malta.

Interesting! Why Malta is in the Balkan reference population?

Hauteville said:
Maltese in the Balkan cluster and Cypriot in the Middle Eastern cluster, doesn't make much sense.

Indeed!

Hauteville · Apr 30, 2016

Pax Augusta said:
Interesting! Why Malta is in the Balkan reference population?

Maybe the managers of 23andme doesn't know the geography?jokes aside, I have seen some Maltese results and they get mostly Italians, they seem like Sicilian but with a little touch more of North African percentage.

Angela · Apr 30, 2016

Pax Augusta said:
Interesting! Why Malta is in the Balkan reference population?

I don't know. It doesn't make sense to me. Neither does the inclusion of Greek Cypriots with "Middle East", which is really the Near East minus Jordanians, Palestinians, Saudi Arabians, Yemenites etc. Putting them with North Africa doesn't make all that much sense to me either. There's definitely similarity between these latter groups and Egyptians, but it fades dramatically as you go closer to northwest Africa.

This is a problem with all these tests. There's no absolute break genetically between population groups. Everything is on a cline. Where you draw the lines is going to change all the percentages.

Look at Dnaland. They don't have all that many samples there, so it's even harder for them to form reasonable clusters, and these are top notch people working on that from what I remember reading.

What they have so far is an "Italian" cluster that runs from a big chunk of France, to Switzerland, Austria, part of Slovenia, and all of Italy all the way down to and including Tuscany and perhaps Umbria from the map . So, does that mean that northern Italians are indistinguishable from Austrians? Obviously not if you look at any results from any other genetic test or even academic analysis of these two populations. What it's doing is creating a scenario where lots of northern Europeans are getting big "Italian" scores. Some unsophisticated consumer can get the absolutely incorrect perception that they have "recent" Italian ancestry.

All it's really saying is that populations near one another geographically have overlap, which anyone with any common sense should already know.

Hauteville · Apr 30, 2016

I have also seen some MENAs who score some European percentage and they score Italian, maybe Roman contacts?It's weird though.

Angela · Apr 30, 2016

"Italian" is actually a more meaningful category in 23andme. At least it's only based on one country's results, rather than the many countries used as references for the "Balkan" cluster.

Italian

The peninsula of Italy is home to a genetic legacy not only of the Roman Empire, but also of groups from both northern Europe and the eastern Mediterranean that occupied Italy at various points in its history.

Population	Source	Sample Size
Italy	23andMe	556
Italy	1000 Genomes	98
North Italian	HGDP	13
Tuscan	HGDP	8

Still, as can be seen from the list, it's based heavily on testees, the vast majority of whom have ancestry from southern Italy and Sicily, which explains why they get such heavy "Italian" scores.

When the algorithm is applied to a sample from, say, Friuli, a good portion can't be categorized as "Italian", so it goes goes up the pyramid and just generally says its Southern European. Really, it's also "Italian" as the algorithm would show if they had hundreds of samples from the Veneto and Friuli.

Let's take the Spezzino sample again. The result showed 51% "Italian" versus say 32% for someone from the Veneto and yet the "Southern European" score for the Spezzino is 73% versus scores of 76% that I've seen for people from the Veneto. Does that mean that the person from the Veneto is more "southern" in a common sense way than a Spezzino because the Spezzino may get a higher percentage of the 23andme "Northern European" cluster? (Southern Italians of course get even higher Southern European numbers.)

In one sense yes, but in another sense perhaps not, because the "Southern European cluster" includes the "Balkan" cluster which contains people not only from Malta and Greece and Albania (actually a higher percentage) but also people from Croatia, which has a slightly more "Northern" affinity. So, once again, this "Balkan" cluster is problematic. I could make the argument that "Balkan" in southern Italians, including Sicilians, isn't Greek but Albanian, and comes from documented Albanian gene flow into lots of areas of southern Italy. "Balkan" in the Veneto could be "Croatian", on the other hand. The "Balkan" cluster is too broad to pin it down any better.

Of course, the differences between the different "Balkan" groups are exaggerated. That's even without considering the Greeks, who some companies group with southern Italians, not with Bulgarians or Romanians.

See:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105090

Indeed, Ancestry Dna looks at all of this in a very different way from what I understand. There, southern Italians and Greeks form one genetic cluster, separate both from Northern and Central Italians and actual Balkanites. I haven't sent them data and nor have many of my shares, so I don't know how that works out, but I can see some logic there.

@Hauteville,
I don't know. How much do they get? Considering how southern Italians and Sicilians dominate the reference population for the "Italian" cluster, it could just be picking up overlap between these groups of Italians and some Middle Easterners. Is it just a remnant from a shared Neolithic farmer past or is it more recent? If more recent, how recent? Romans indeed? Could it be from the Crusader Era? There certainly were a lot of Italian merchants who had emporia and actual cantons there, but if it's a big number I don't think that's plausible. If it's a few percent, then perhaps.

This is why I don't think these kinds of questions can be based on modern dna, and especially not when people choose to use one test or another based on some sort of agenda.

Yetos · Apr 30, 2016

Guys
until 1930's there was a big Makedonian community at Malta
and even Greeks cypriots used it as emporium with Britain and Spain,
it was estimated to be about 3-6 000 at 1820 both Vallet and Rambato
the first count of Malta was Mαργαριτωνης at 1190, an owner of a fleet,
Kαλαμιας was a Greek from Damascus who settle there.
Baring brothers and Hottinguer merchant with Greeks there
at works of Febbrazo and Αλεξανδρος Λετσας you can find more.
today live around 700

anyway Malta is not Balkans at geography, neither Cyprus, neither Hungary, neither Moldova, neither Slovenia, neither asian Turkey, but many times are added as expansion of Balkans,

https://en.wikipedia.org/wiki/Greeks_in_Malta

now Cyprus has a lot of Syria Palestine etc populations moved under the pressure of islam and the failure of crusades.

23andMe Questions about 23andme

Elite member

Attachments

Banned

Elite member

Banned

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Elite member

Regular Member