Sardinian Y-DNA Phylogeny per Francalacci et al. 2013

sparkey

Great Adventurer
Messages
2,250
Reaction score
352
Points
0
Location
California
Ethnic group
3/4 Colonial American, 1/8 Cornish, 1/8 Welsh
Y-DNA haplogroup
I2c1 PF3892+ (Swiss)
mtDNA haplogroup
U4a (Cornish)
The original paper is here, although it is probably easier to refer to Dienekes' analysis here. The biggest takeaway is this phylogenetic tree (also available at Dieneke's analysis):



When discussing Sardinian settlement during the Neolithic, it's easy to conclude that they were I2a1a dominant, considering that I2a1a is derived from a Paleolithic lineage, it has an earlier TMRCA than most European lineages, and it is found in multiple Neolithic European samples. But this phylogeny does not support that at all, with an obvious founder effect having a more recent expansion than even the Sardinian R1b. Rather, it looks like early Sardinians would have been mainly J2b, J2a, and G2a, with E1b a possibly important minority.

Something less important to most people but exciting to me is that we've finally got a peer reviewed study that tested for I2c! They are well fewer than 1% of the total samples, but it's an indication that I2c frequency extends in trace levels down to Sardinia, which we haven't seen before, and for once we can conclude that this is indeed I2c, and not its cousin, I2b-ADR. Its early splitting point in Sardinia could be an indication that I2c is relatively ancient to the Mediterranean, as opposed to more being more ancient to modern Germany, as I've proposed before.
 
Seems that slowly the structure direct under F is unravelling. That's nice.
 
I don't see the point of analysing the European Y-chromosomal phylogeny from a Sardinian standpoint. That's a bit like looking at the world through the fisheye lens of a peephole. Everything gets distorted. That's why I2a1a looks like the most important lineage in Europe when it is pretty much restricted to Sardinia and Iberia. Why didn't they take samples from strategically chosen locations in Europe so as to increase the resolution of all subclades ? Good sample regions would have been places with high genetic diversity, such as the Latium, Campania, Sicily, Greece, Albania, Bulgaria, Romania, Moldova, Ukraine, Serbia, Hungary, Austria, Germany, Switzerland... then throw in Sweden + Finland or Estonia for northern haplogroups (I1 and N).

I wouldn't purchase the article without knowing for sure, but it would really be a shame if the authors didn't at least provide the detailed frequencies for Sardinia. 1200 deep Y-DNA subclades is a great opportunity to determine once and for all what subclades the Vandals carried (see my analysis here), especially the subclades of R1a and R1b, but also the subclades of I1 and I2a2a. Well regionally divided frequencies would also help us see more clearly where the Phoenicians and the Romans settled in Sardinia (or at least where their Y-DNA ended up today). Since the Romans are the only possible candidate who could have brought R1b-U152 to Sardinia (probably the only place in Western Europe that was never settled by any other Italic or Celtic tribe) we could know what subclades of U152 the Romans really carried (L2+, Z36+, Z56+, others ?) since continental Italy is too mixed to tell.
 
I got a copy of the article, but it is disappointingly messy. All I wanted was a frequency table, hopefully divided by regions, but there isn't even one with the total of the 1200 samples. This study is effectively useless to determine the geographic distribution of each haplogroup and subclade within Sardinia.

The paper merely include a numbered list of individuals sorted by haplogroups. I had to calculate the frequencies myself (see table below). As if it wasn't inconvenient enough, they didn't care to mention the SNP's next to the subclade denomination, and deep clades aren't mentioned either. I had to check in the supplementary materials for deep subclades, but the tables are far from being clear.

HaplogroupsNumber of samplesPercentage
A1b1b2b60.5%
E1a160.5%
E1b1b1a1322.6%
E1b1b1b1705.8%
E1b1b1b2242%
F370.6%
G2a2b403.3%
G2a3917.6%
I1a3a220.16%
I2a1a46538.75%
I2a1b20.16%
I2a2a100.8%
I2c100.8%
J1c635.25%
J2a746.1%
J2b232%
L70.6%
T272.25%
Q1a3c10.01%
R1a1a1151.25%
R1b1a218515.4%
R1b1c292.4%
R2a1100.8%


I am baffled at the great discrepancy between this study of Sardinians and the total of the previous studies I compiled, which had nearly 1100 samples. Previous studies had 13% of G2a against 10.9% here, 8.5% of E1b against 11% here, 2.5% of J1 against 5.2% here, 9.5% of J2 against 8% here, 1% of L+T against 2.8% here, 1% of Q against 0.01% here, 1% if I1 against 0.01% here... That's quite a lot of differences for two huge studies of a relatively sparsely populated island (1.6 million inhabitants).

If the frequencies can vary by 2 or 3% for most haplogroups between two big studies, how much confidence can we have in regional data with less than 500 samples, let alone less than 100 samples ? Yet, as far as Italy is concerned, only Sardinia, Sicily and Trentino-South Tyrol have over 500 samples if we combine all studies to date. Not even Tuscany ! Yet Italy is one of the best studied countries in the world for Y-DNA. That makes me wonder about the accuracy of the present data for every country. There could be huge changes to come, in the order of 10 to 20% for a single haplogroup in poorly sampled countries.

---

UPDATE I:

I checked the SNP list in the supplementary data and found 128x R1b-U152 (10.6% of the total). Among them there were 10x L2 (including 2x L20), 13x Z56 (including 7x Z144) and 105x U152*. These should be the Roman subclades I was looking for. Of course it's always possible that all Italic and Alpine Celtic people possessed all subclades of U152, in which case my quest is a deadend. But it rather looks like the Romans belonged to a branch that has not yet been identified and shows up as U152*.

As for other subclades of R1b there are 29x V88 (2.4%), 10x M269* (0.8%), 9x L23* (0.75%), 4x DF27>Z196>Z209 (including 2x Z216+), 2x L21>DF13>L513/DF1, and 2x U106>Z381 (including 1x Z301>L47>Z9). L23 is the dominant form of R1b in Sardinia. Since the Greeks had a very limited presence on the island, and the Etruscans never settled Sardinia, L23 was surely the other main Roman R1b subclade along with U152.

UPDATE II:

As for the Vandals, they surely brought R1b-U106 (Z381+), but also R1b-L21 (L513+), which has been found in Sweden (in addition to Germany and the British Isles). R1b-DF27 (Z209+) is more problematic since it is found in all western Europe, from Sweden to Spain. It could be Vandalic, but just as well be Catalan if the samples came from north-west Sardinia. It's really a shame that such a large-scale study does not provide the geographic location of the samples tested. That could have resolved this question easily.

Both samples of I1 are listed as I1a3a2, a subclade that isn't listed by ISOGG. I suppose it is a new subclade under I1a3a (L1237). I1a3 is found all over Europe, but interestingly also in Poland, Spain and southern Italy, which could confirm the Vandal connection.

Among the I2a2a (M223) individuals there were 6x L701>L699 (also found in Sicily), 1x CTS616* and 2x L1228 (a newly identified subclade splitting just after M223).

Among the 15 R1a1a individuals were 11x Z282 (among which 5x Z280, 1x M458>L1029 and 5x probably M458>L260) and 4x Z93 (including 3x Z94>L342.2>Z2123). The Z2123 is Middle Eastern or Central Asian and could have been brought either by the Phoenicians or the Alans.

The fact that the Vandals stayed in Poland before migrating to the Roman Empire is probably the reason for the elevated R1a (and the fact that it belongs to the Central European Slavic or rather Proto-Slavic branch). It is therefore likely that the two I2a1b-M423 samples are also of Proto-Slavic origin.

Based on this (perhaps not representative) Sardinian snapshot, the Vandals seem to have carried 33% of R1a (11 samples), 30% of I2a2a (10 samples), 24% of R1b (8 samples), 6% of I2a1b (2 samples), and only 6% of I1 (2 samples).

Another remarkable thing is that I2-M223 is the main Germanic haplogroup, as prevalent as all the I1 and R1b combined. The Vandals are thought to have originated in central or northern Sweden, where I2-M223 is higher than elsewhere in Scandinavia.
 
Last edited:
I don't see the point of analysing the European Y-chromosomal phylogeny from a Sardinian standpoint. That's a bit like looking at the world through the fisheye lens of a peephole. Everything gets distorted. That's why I2a1a looks like the most important lineage in Europe when it is pretty much restricted to Sardinia and Iberia.

True, if you're trying to extract the European history of these haplogroups and their subclades, you'll get totally the wrong picture looking at Sardinia. It makes I2a1a look young and widespread, when in fact it is old and rare. But I think we can get some use out of it anyway. In particular, it can help us determine the genetic history of Sardinia (only) by isolating founder effects. Of course, with the remaining haplogroups, we still need to sort through whether they are diverse because they entered Sardinia early, or because the populations that brought them already had a diverse distribution of that haplogroup.

I'm thinking of J2b in particular--it's usually thought to have had its primary spread in Europe later than J2a, but it looks at least as diverse in Sardinia. Why?

Why didn't they take samples from strategically chosen locations in Europe so as to increase the resolution of all subclades ?

Research budget?
 
For some reason i can't read the whole article. Can someone provide information about the SNPs they tested for I-L160 ? Did they only test L160 or also SNPs below L160. Would be highly interesting to see, if the Sardinians are solely I-PF4190 or have some admixture of I-F1295 or I-PF4088.
 
I got a copy of the article, but it is very disappointing. All I wanted was a frequency table, hopefully divided by regions, but there isn't even one with the total of the 1200 samples. Fortunately there is a list of individuals number sorted by haplogroups, so I can get calculate the frequencies myself. As if it wasn't inconvenient enough, they didn't care to mention the SNP's next to the subclade denomination. I'll have to check in the Supplementary Materials.

HaplogroupsNumber of samplesPercentage
A1b1b2b60.5%
E1a160.5%
E1b1b1a1322.6%
E1b1b1b1705.8%
E1b1b1b2242%
F370.6%
G2a2b403.3%
G2a3917.6%
I1a3a220.16%
I2a1a46538.75%
I2a1b20.16%
I2a2a100.8%
I2c100.8%
J1c635.25%
J2a746.1%
J2b232%
L70.6%
T272.25%
Q1a3c10.01%
R1a1a1151.25%
R1b1a218515.4%
R1b1c292.4%
R2a1100.8%


Too bad those figures are useless for my purposes. They don't mention the subclades of R1a1a or R1b1a2. There is the subclade of I1 but it was already available in the free chart. Actually, even after checking the supplementary materials I am still not sure what SNP defines their I1a3a2, which isn't listed by ISOGG. It might be a new subclade of I1a3a (L1237).

---
UPDATE: I checked the SNP list in the supplementary data and found a dozen R1b-U152. The SNP's matching known subclades are L20 under L2, Z56, and Z144 under Z56, so these could be the Roman ones I was looking for. Of course it's always possible that all Italic and Alpine Celtic people possessed all subclades of U152, in which case my quest is a deadend. Anyway most of the U152 in Sardinia is U152*.

Among the non-U152 R1b1a2 there are four L23, a few DF27 (Z195, S230 and Z216), one L21 (DF1/L513), one L11, and one U106 (Z381 > Z301 > L47 > Z9). The last one is surely the Vandal R1b I was looking for. The L11 could also be Scandinavian.

Two subclades of I2a2a (M223) were identified: L701>L699 and L1228.

The R1a1a individuals have the following subclades Z93>Z94 (Phoenician), Z280 (Balto-Slavic ?), Z282 (European), M458>L1029 (Central European ?). No typically Germanic subclade. I wonder how the Slavic looking R1a got there. The Goths ? There are also a couple of samples of I2a1b (M423) to support that.

---

I am also surprised at the great discrepancy between this study of Sardinians and the total of the previous studies I compiled, which had nearly 1100 samples. Previous studies had 13% of G2a against 10.9% here, 8.5% of E1b against 11% here, 2.5% of J1 against 5.2% here, 9.5% of J2 against 8% here, 1% of L+T against 2.8% here, 1% of Q against 0.01% here, 1% if I1 against 0.01% here... That's quite a lot of differences for two huge studies of a relatively sparsely populated island (1.6 million inhabitants).

If the frequencies can vary by 2 or 3% for most haplogroups between two big studies, how much confidence can we have in regional data with less than 500 samples, let alone less than 100 samples ? Yet, as far as Italy is concerned, only Sardinia, Sicily and Trentino-South Tyrol have over 500 samples if we combine all studies to date. Not even Tuscany ! Yet Italy is one of the best studied countries in the world for Y-DNA. That makes me wonder about the accuracy of the present data for every country. There could be huge changes to come, in the order of 10 to 20% for a single haplogroup in poorly sampled countries.

Where did you get your haplotype numbers from?

this is what I got for R-U152
1204 tested of which
1 x French Basque I2a1a
1 x North Italian I2a1a
1 x Tuscan G2a
1 x Corsican G2a

The remaining comprised of 128 x R-U152
75 x Z192+ = 58.6%
23 x (xL2, Z36, Z56, Z192) = 18.0%
13 x Z56+ = 10.2%
10 x L2+ = 7.8%
7 x Z36+ = 5.5%

seems like the majority was Z192

I think it looks like the Bell Beaker crashed into the Stelae in Sardinia

The Otzi G-L91 was not present, but there was G-L497 in abundance
 
Last edited:
Where did you get your haplotype numbers from?

this is what I got for R-U152
1204 tested of which
1 x French Basque I2a1a
1 x North Italian I2a1a
1 x Tuscan G2a
1 x Corsican G2a

The remaining comprised of 128 x R-U152
75 x Z192+ = 58.6%
23 x (xL2, Z36, Z56, Z192) = 18.0%
13 x Z56+ = 10.2%
10 x L2+ = 7.8%
7 x Z36+ = 5.5%

seems like the majority was Z192

I mistook while counting. I have now corrected the numbers.

What is Z192 ? I have never heard of it, it is not listed by ISOGG nor FTDNA, and I don't see it in this paper either. I see a buncg of 75x R1b-U152 samples but there is no SNP assigned to them.
 
I mistook while counting. I have now corrected the numbers.

What is Z192 ? I have never heard of it, it is not listed by ISOGG nor FTDNA, and I don't see it in this paper either. I see a buncg of 75x R1b-U152 samples but there is no SNP assigned to them.






new tree - 23 July 2013
 
Ok, but how did you determine that there were 75x Z192 in the Sardinian data ?

Here is a table Richard Rocca put on another forum also discussing this topic, please feel free to join in :)
http://www.anthrogenica.com/showthread.php?1180-U152-in-Sardinia-Francalacci-et-al-2013


Francalacci_U152.png
 
Thanks, that is very useful.

It is interesting that of the 23 'U152+ L2- Z36- Z56- Z192-' group, there are:

9 with 23407934 C>G
and 3 with 23119461 G>A


Still leaving 11 U152*. Unfortunately the above are not in Geno or available at FTDNA to test. So my Y remains a U152* 'aimlessly' wandering Europe since U152 :)
 
Among the 15 R1a1a individuals were 11x Z282 (among which 5x Z280), 3x Z93 (Z94>L342.2>Z2123) and 1x M458 (L1029+). The Z2123 is Middle Eastern and was probably brought by the Phoenicians. Although there is no typically Germanic subclade, Z282 and L1029 are not exclusively Balto-Slavic and have also been found in Sweden and Germany. If all of it was brought by the Vandals (who else ?), it looks like the Vandals were heavy on R1a.

The fact that the Vandals stayed in Poland before migrating to the Roman Empire is probably the reason for the elevated R1a (and the fact that it is so Slavic-looking). It is therefore likely that the two I2a1b-M423 samples are also of Proto-Slavic origin.

Z2123 is very widespread from India to the Bashkirs to Germany. If it does prove to be Middle Eastern in origin I imagine it was probably brought into Sardinia during the Roman or Byzantine Empires. Or if Z2123 originated in the steppe, then the Alans are the likely source along with perhaps the Q1a3c.

It's not really too Slavic looking at all. The Z282 and Z280 are probably remnants from the Pomeranian Culture (650BCE-200BCE) and Lusatian Culture (1300 BCE - 500BCE) which are by most accounts Baltic with strong connections to the Nordic Bronze Age.
 
Z2123 is very widespread from India to the Bashkirs to Germany. If it does prove to be Middle Eastern in origin I imagine it was probably brought into Sardinia during the Roman or Byzantine Empires. Or if Z2123 originated in the steppe, then the Alans are the likely source along with perhaps the Q1a3c.

Thanks for mentioning the Alans. They had slipped my mind, but they make excellent candidates for the R1a-Z2123 and Q1a3c samples, especially since there was indeed an Alanic minority among the Vandals. I had already explained that here.

It's not really too Slavic looking at all. The Z282 and Z280 are probably remnants from the Pomeranian Culture (650BCE-200BCE) and Lusatian Culture (1300 BCE - 500BCE) which are by most accounts Baltic with strong connections to the Nordic Bronze Age.

I consider that Proto-Slavic as they descended from the Corded Ware, which was also the culture from which sprang all Balto-Slavic people. Obviously it can be said that Germanic people are partly Proto-Slavic. But I wouldn't call the Lusatian culture Proto-Germanic since it had little if any I1, I2a2a and R1b-S21.
 
The Z2123 in Sardinia is of Arabian origin. It's probably also present in Southern Italy and Sicily.

Alans are a poor explanation for the Z2123 in Sardinia, because Sardinians are one of the few populations in Europe that lack the West Asian genome-wide component. So the Z2123 had to be carried there from its home, which was most likely South Central or West Central Asia, via a less West Asian and more Mediterranean/Middle Eastern source, like Arabs from the Arabian Peninsula and/or East Mediterranean.
 
The Z2123 in Sardinia is of Arabian origin. It's probably also present in Southern Italy and Sicily.

Alans are a poor explanation for the Z2123 in Sardinia, because Sardinians are one of the few populations in Europe that lack the West Asian genome-wide component. So the Z2123 had to be carried there from its home, which was most likely South Central or West Central Asia, via a less West Asian and more Mediterranean/Middle Eastern source, like Arabs from the Arabian Peninsula and/or East Mediterranean.

The Arabs never colonised Sardinia.

There were only 3 R1a-Z2123 samples in Sardinia, i.e. 0.25% of the population. That is consistent with the very minor size of the Alanic contingent, and in good proportion to the 34 samples that could be of Vandalic origin (2.8%).

I suppose that you mean that the Sardinians lack the East European autosomal component, not the West Asian (which according to the Dodecad K=12 admixtures is at 4.6%).
 
...
2x L21>DF13>L513/DF1
...
As for the Vandals, they surely brought R1b-U106 (Z381+), but also R1b-L21 (L513+), which has been found in Sweden (in addition to Germany and the British Isles).

Thank you Maciamo. I've downloaded the three Supplementary files but haven't figured them out. I would like to find the R1b-L21>DF13>L513+ (aka DF1) individuals and see if I can tell anything as far as further subgrouping. There are some 25 odd clusters under L513 and they have different distributions, although Isles biased no doubt. The "M" cluster has a couple of Germans. The "B2", where I sit, has a Frenchman and a Swede to go with the Irish and the Welsh. We also have Dutch but most of the individuals, by far, are British Isles of some type.

Several of the subgroups are marked by SNPs, including these:
L193, L706.2, L705.2, CTS3087, Z1867, P66, L69, L577

If we could figure out what subgrouping the Sardinian L513/DF1 is, that would be very helpful.
R1b-L513_Descendency_Tree.jpg
 
Among the 15 R1a1a individuals were 11x Z282 (among which 5x Z280), 3x Z93 (Z94>L342.2>Z2123) and 1x M458 (L1029+). The Z2123 is Middle Eastern and was probably brought by the Phoenicians. Although there is no typically Germanic subclade, Z282 and L1029 are not exclusively Balto-Slavic and have also been found in Sweden and Germany. If all of it was brought by the Vandals (who else ?), it looks like the Vandals were heavy on R1a.


Isnt M458 also a sub-clade of Z282;

so shouldnt it be 12x Z282 [6x* / 5x Z280 / 1x M458 (L1029+)]

just asking;
 
My comments below are actually reposted from the Molgen forum where Maciamo's post was reposted yesterday. :)

As for other subclades of R1b there are 29x P25 (2.4%), 10x M269 (0.8%), 9x L23 (0.75%), 4x DF27>Z196>Z209 (including 2x Z216+), 2x L21>DF13>L513/DF1, and 2x U106>Z381 (including 1x Z301>L47>Z9).
The above numbers are a bit confusing. For example, there were no P25* cases in that Sardinian sample, and all R1b members were actually P25+, so those 29 cases of P25 from your list are actually all members of clade V88.
Generally, there were 214 R1b people tested in that sample and this included 29 members of V88 and 185 members of M269.
The M269 group was further divided into the large clade L23 (175 people) and a much smaller M269* group represented by 10 people belonging to a new (relatively young) subclade of M269 (a novel sister clade of L23).


L23 is the dominant form of R1b in Sardinia. Since the Greeks had a very limited presence on the island, and the Etruscans never settled Sardinia, L23 was surely the other main Roman R1b subclade along with U152.
I would rather say that it is P312 (including of course U152) that is a dominant form of R1b in Sardinia, as the L23 group included only six members of L23(xL51), three members of L51(xL11), three cases of L11(xP312,Z381) and two cases of Z381 (a subclade of U106, though it should be noted that U106 itself was not analyzed). The remaining 161 people were all P312+, and this included 128 members of U152, as you have rightly noticed.


Among the 15 R1a1a individuals were 11x Z282 (among which 5x Z280), 3x Z93 (Z94>L342.2>Z2123) and 1x M458 (L1029+).
These numbers do not seem to be correct. Those 15 R1a members were all R1a-Z645, and in addition to those 11 members of Z282 you have mentioned, there were actually four cases of Z93, including three people tested as Z94+ (all of them indeed Z2123+).

The Z282 group included indeed five Z280 members, but all six remaining Z282 members were likely M458+ (although M458 was not tested, but their M458+ status could be deduced from the positive PF6155 and PF7521 results). Generally, there is a problem with an insufficient SNP detection rate for haplogroup R1a (and generally for all lineages that were not well represented in that Sardinian sample). For example, the only case recognized as L1029+ was assigned as negative for PF6155 and PF7521 (both being upstream of L1029). As for the entire group of eleven Z282 members, only three of them were tested as Z282+ and four were tested as Z283+ (so the Z283/Z282+ status of the entire group was based on some downstream markers only). All this makes it quite likely that all those M458 members identified in Sardinia were either L1029+ or L260+ (as the region encompassing L260 was most likely not analysed).

In addition to Z282 and Z283, there is another new SNP marker uniting M458 and Z280 called CTS11197 (not found in the Z93 branch). Since this mutation was initially identified in the 1KG project, I suspect that it is shared with Z284 (otherwise somebody would have noticed that it separates M458 and Z280 from Z284), but it would be of course reasonable to verify it.

As for the Z280 group, it includes one individual that is likely negative for both CTS1211 and CTS3607. Intriguingly, he is positive for a marker called CTS4648 that was first identified in a 1KG sample (though I don't know in which one). Importantly, this particular marker is included into the Geno 2.0 chip but no R1a member has been tested positive for it, so far.

The remaining four Z280 members seem to be positive for CTS1211 and CTS3607, and I suspect that three of them belong to a specific subclade of CTS1211 (or even a subclade of CTS3402, although CTS3402 was probably not analysed), as they all were positive for at least 12 markers downstream of CTS1211, including CTS8816 (first identified in a 1KG sample but not included into Geno 2.0).


Although there is no typically Germanic subclade, Z282 and L1029 are not exclusively Balto-Slavic and have also been found in Sweden and Germany. If all of it was brought by the Vandals (who else ?), it looks like the Vandals were heavy on R1a.
I don't think that connecting L1029 with Vandals (and generally with some early Germanic tribes) is a reasonable assumption, but let's wait for some aDNA data. :)
 

This thread has been viewed 85455 times.

Back
Top