Post Formal States Here!!

Fire Haired14

Banned
Messages
2,185
Reaction score
582
Points
0
Y-DNA haplogroup
R1b DF27*
mtDNA haplogroup
U5b2a2b1
I'm opening up this thread for posting any useful formal stats. By formal stats I mean: D-stats, F3-admix and F3-drift, IBD, IBS, qpADM, TreeMix, and others. Basically anytype of analysis of autosomal DNA that isn't ADMIXTURE and PCA. Two good blogs to follow to find useful formal stats is: Kalevan ja Untamon geenit and Eurogenes.

Formal-stats are what we need to decipher where populations fit in a tree-like model. This is why they're so important. Right now no one really knows how various modern Middle Easterners relate to EEF/WHG/EHG/CHG, and formal stats can resolve the relationship. I'm going to be posting stats attempting to figure out what their relationship is soon.
 
Here are some interesting D-stats I got from Eurogenes and Kalevan ja Untamon geenit(author name mauri).

Kalevan ja Untamon geenit

In this post Mauri attempted to differentiate Bedouin and Anatolia_Neolithic-type Middle Eastern admixture in Europe. What he found was that closeness to Anatolia_Neolithic and Bedouin correlate very well. The closer a European is to Anatolia_Neolithic the closer he is to Bedouin.

In this post Mauri showed the split between North/South Europe in D-stats. He tested whether a large array of ancient European genomes are closer to a set of South/North Europeans or to Kent-England. It confirmed all the patterns we know to exist.

But his post also revealed something new. I put all of his results in this spreadsheet(guessed on actual scores based on a graph, might not be totally accurate). North Italy has about as strong of a relationship to EEF as do Basque, and much stronger than South Italy, Greece, and Spain. North/Central Italy is a EEF-stronghold we haven't noticed.

Eurogenes

Davidski did all of these D-stats for me to test if EEF, CHG, and modern West Asians have common ancient West Asian ancestry: West Asian Kinship. It confirms that Anatolia_Neolithic and CHG are closer to modern West Asians than to Loschbour. It also shows that CHG and EEF aren't significantly closer to each other than to Loschbour.

This collection of D-stats from Eurogenes is very useful: D-stats with 4 ancestors of Europeans. I found that EHG/MA1 ratio is consistent in all of modern and ancient Europe. This is because EHG must be the primary or sole source of MA1-affinity. There's no doubt in my mind looking at the stats there's EHG/MA1/Steppe ancestry in just about all of Europe to some degree. Nothing in West Asia can explain the stats. You can see I added results of South and North Asians. It's pretty interesting, it looks like there's lots of EHG/ANE in both regions(esp. Mansi)

It's incredible how consistent the stats are with narrative we've learned on European origins. With D-stats it is important to remember that a score of 0.001 in modern populations is significant. When people from the same region keep getting 0.001 that's important. So, the scores look hardly differnt but those little numbers really matter.
 
Here are some interesting D-stats I got from Eurogenes and Kalevan ja Untamon geenit(author name mauri).

Kalevan ja Untamon geenit

In this post Mauri attempted to differentiate Bedouin and Anatolia_Neolithic-type Middle Eastern admixture in Europe. What he found was that closeness to Anatolia_Neolithic and Bedouin correlate very well. The closer a European is to Anatolia_Neolithic the closer he is to Bedouin.

In this post Mauri showed the split between North/South Europe in D-stats. He tested whether a large array of ancient European genomes are closer to a set of South/North Europeans or to Kent-England. It confirmed all the patterns we know to exist.

But his post also revealed something new. I put all of his results in this spreadsheet(guessed on actual scores based on a graph, might not be totally accurate). North Italy has about as strong of a relationship to EEF as do Basque, and much stronger than South Italy, Greece, and Spain. North/Central Italy is a EEF-stronghold we haven't noticed.

Eurogenes

Davidski did all of these D-stats for me to test if EEF, CHG, and modern West Asians have common ancient West Asian ancestry: West Asian Kinship. It confirms that Anatolia_Neolithic and CHG are closer to modern West Asians than to Loschbour. It also shows that CHG and EEF aren't significantly closer to each other than to Loschbour.

This collection of D-stats from Eurogenes is very useful: D-stats with 4 ancestors of Europeans. I found that EHG/MA1 ratio is consistent in all of modern and ancient Europe. This is because EHG must be the primary or sole source of MA1-affinity. There's no doubt in my mind looking at the stats there's EHG/MA1/Steppe ancestry in just about all of Europe to some degree. Nothing in West Asia can explain the stats. You can see I added results of South and North Asians. It's pretty interesting, it looks like there's lots of EHG/ANE in both regions(esp. Mansi)

It's incredible how consistent the stats are with narrative we've learned on European origins. With D-stats it is important to remember that a score of 0.001 in modern populations is significant. When people from the same region keep getting 0.001 that's important. So, the scores look hardly differnt but those little numbers really matter.


I hate to be snippy, but while you and other posters here and at Eurogenes haven't noticed that Northern Italy is an EEF stronghold (and to some extent Toscana as well) I've been pointing it out in dozens of threads here since before Lazaridis et al came out. That's not to say that Spain isn't an EEF stronghold as well, but her demographic history is a little different. You don't need formal stats to see it either. It's all there in the admixture from these various papers.

That said, this is a very nice graphic to compare the southern Europeans, and, as you say, Basques and Northern Italians are pretty close.
http://terheninenmaa.blogspot.fi/2015/12/north-and-south-european-ancient-roots.html

I don't know if you recall, but I posted the thread on the problems with Admixture analyses when they're done incorrectly, so I'm quite aware of the issues. They can be done properly, however, as I think Kurd over at Anthrogenica is doing them. Formal stats also have their issues. They also can be done incorrectly, and can even be manipulated to show what someone wants them to say. There's no perfect method.

It also shows that CHG and EEF aren't significantly closer to each other than to Loschbour.

This one I don't buy.
 
@Angela,

D-stats give what seem like wrong sometimes. You just have to keep testing, and eventually it'll make sense. According to D-stats Anatolia_Neolithic is closer to Loschbour than to CHG, and CHG is barely closer to Anatolia_Neolithic than to Loschbour. Lots of testing is needed to know their relationship is. Anatolia_Neolithic might be more WHG than you think. Anatolia_Neolithic is significantly closer to HungaryWHG than to Loschbour. This matches geography suggests it has European-WHG ancestry.
 
@Angela,

D-stats give what seem like wrong sometimes. You just have to keep testing, and eventually it'll make sense..........

I just discovered why some D-stats get weird results. It'll take awhile to understand because of all the letters.

Explanation.

I test whether population-A is closer to population A(from same population as the other A) or C. Then I test wither population-D is closer to A or C. I find that both A and D are closer to A than to C. But even though both As are from the same population, D gets a more extreme-score than A does which makes it appear D is closer to A than A is.

Why is this? It's because A is identical to the other A but also has common ancestry with C that D lacks. So, WHG is closer to Kent-England as opposed to EEF than Cornwall-England is to Kent-England as opposed to EEF.
 
I forgot to post this earlier. To understand how D-stats work you need to read this: Understanding D-stats

The word *Opposed* is most important to understand in D-stats

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bergamo-Italy is closer to Anatolia_Neolithic than Sicily is, but when their affinity to Anatolia_Neolithic is being *opposed* to affinity to German_Beaker, Sicily will give a more significant score towards Anatolia_Neolithic. This is not because Sicily is closer to them, it is because Bergamo-Italy also has affinity with German_Beaker that Sicily lacks.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
Testing for Non-EEF in Italy

I've started using an alternative method for D-stats. Usually D-stats go like this D(OutGroup, PopulationA, PopulationB, Population C) to descipeher if A is closer to B or C. But instead all four positions are taken up by populations. I explain why this differnt form is useful in Strategy #1 here: Alternative D-stats stratgies.

Results using this strategy are in Sheet "Test #2, Italy" of this spreadsheet: D-Stats Ideas

Chad Rolfhsen did stats for me with Italians, to test whether they have more German_Beaker and West Asian affinity as *opposed* to German_Beaker/West Asia than Copper age samples from Northern tip of Italy.

Here's a preview: They confirm modern Italians aren't the same Copper age Northern ones. It also could mean they have German_Beaker and West Asian ancestry. But there's no way to know with these stats, F3-stats could help determine that,

-------------------------------------------------------------------------------------------------\

Long Explanation of Stats

As far as I can tell, the stats reveal that......

1: Anatolia_Neolithic Affinity Opposed to German_Beaker: There's a South-North trend in this affinity. Bergamo has the most, Tuscany 2nd, Sicily 3rd, and South Italy 4th. However, Copper Age_Italy's affinity to Anatolia_Neolithic as opposed to German_Beaker is still higher than South Italy's.

2: German_Beaker Affinity Opposed to West Asian: This has the exact same South-North Trend. Bergamo has the most, Tuscany 2nd, Sicily 3rd, and South Italy 4th. Copper Age Italy has more affinity to German_Beaker opposed to West Asians than any modern Italians, except maybe Bergamo.

What 1 and 2 Reveal.

Most importantly Modern Italians and Copper age North Italians are not a clade. Moderns have ancestry Copper age Italians lacked. There's ancestry in Italians that is either German_Beaker and West Asian related or something unrelated to both, which Copper age North Italians lacked.

1: If modern Italians were equally closer to Anatolia_Neolithic opposed to German_Beaker, the result would be 0.000. Instead the results are all negative. These stats don't tell us rather this is because of German_Beaker-like ancestry or not.

2: Similarly, if modern Italians were equally closer to German_Beaker as opposed to West Asians, the results would be 0.000. Instead the results are all positive. meaning Copper age_Italy is closer to German_Beaker opposed to West Asians than modern Italians. These stats don't tell us rather this is because of West Asian-like ancestry or not.

Of West Asians, all Italians are closest to Georgians when opposed to German_Beaker. The others it is closest to are both from the Caucasus: Armenian and Kotais(CHG man). This could mean if there's West Asian admixture in Italy, it's more similar to Caucasus(CHG) than to SouthWest Asians(Druze, Bedouin). But a more likely explanation IMO is the African admixture in SouthWest Asia pull them away.
 
All very interesting, but I fail to see what is revealed that wasn't previously known. Who ever said that Remedello and modern Northern Italians are identical? Perhaps someone else has been making the argument that at least Northern Italians are unchanged since the Copper Age, but I certainly haven't. There has obviously been some admixture with other groups. Admixture analyses and PCAs have already told us that. The question is when, with whom, and how much. This analysis does nothing to elucidate that.

Also, with all due respect, Fire-Haired, I am not convinced that amateurs are any better at using D-Stats than some of them are at using Admixture.
 
All very interesting, but I fail to see what is revealed that wasn't previously known. Who ever said that Remedello and modern Northern Italians are identical? Perhaps someone else has been making the argument that at least Northern Italians are unchanged since the Copper Age, but I certainly haven't. There has obviously been some admixture with other groups. Admixture analyses and PCAs have already told us that. The question is when, with whom, and how much. This analysis does nothing to elucidate that.

We've got to start from scratch with formal stats. I'm not making many assumptions. I need D-stats to confirm some of the assumptions we have, and then move on from there. There's also a lot of trail/error and things D-stats can never tell us.

Also, with all due respect, Fire-Haired, I am not convinced that amateurs are any better at using D-Stats than some of them are at using Admixture.

That's a big reason why I'm starting to look at formal stats more than before. Before I relied on ADMIXTURE and PCA. Formal stats are much more straight forward and less dependent on the people making them.
 
Dis-Correlation and Correlation in Cyriot and EEF affinity.

Results are from this spreadsheet: D-stats with 4-ancestors of Europeans

Below are results from the D-stat(Chimp, Cypriot, Mbuti, European) and D-stat(Chimp, LBK_EN, Mbuti, European). Most of the same Europeans are in each chart. Kosovar, Macedonian, Montenegron, Serbian, and Bosnian in EEF plot are ranked 17-21 but in Cypriot plot are ranked 1-7. South Italy also has a high Cypriot stat but isn't tested for EEF stat.

Neolithic Europeans have an even high stat in the Cypriot plot, however I don't think it can explain high Cypriot and low EEF. Furthermore the moderns highest EEF stat have significantly lower Cypriot stats than Balkans.

The Cypriot stat for EEF and EEF stat for Cypriot makes it clear there's a close relation between EEF and Cypriot. So, to some degree Cypriot stat correlates with EEF, but the Balkan/South Italy thing breaks that correlation.

RankEuropeCypriot
1Kosovar40.71
2Macedonian40.52
3Montenegrin40.59
4Serbian40.56
5Dutch40.51
6Croatian40.48
7Bosnian40.46
8South Italy40.35
9Sardinia40.04
10Bergamo39.99
11Spain_Cantabria39.5
12Albanian39.82
13Greek39.78
14Tuscany39.78
15French_South39.78
16Spain_Aragon39.74
17Spain_Castilla_la_Mancha39.64
18Bulgarian39.61
19English Cornwall39.59
20French39.59
21Norweigain39.51
22Spain Canatabria39.5
23Czech39.45
24Hungarian39.44
25Orcadian39.41
26Estonian39.4
27Ukrainian_East39.37
28EastSicily39.28
29Spain_Andalucia39.27
30BeloaRussian39.26

D(Chimp, LBK_EN, Mbuti, European)
RankEuropeLBK_EN
1Sardinia42
2Bergamo41.46
3Albanian41.27
4Spanish_Aragon41.27
5Tuscany41.19
6Greek41.13
7Spanish_Castilla_la_Mancha41.03
8Croatian40.94
9Bulgarian40.94
10Norweigain40.81
11Czech40.79
12Hungarian40.76
13Ukrainian_East40.64
14EastSicily40.58
15BeloaRussian40.5
16Estonian40.5
17Kosovar40.44
18Macedonian40.25
19Serbian40.24
20Montenegrin40.21
21Bosnian40.12
22Dutch40.07
23Polish39.9
 
These stats support the idea there's EHG in Italy.

1. Remedello_BA Italian_Tuscan Karelia_HG Loschbour -0.0253
2. Chimp Karelia_HG Samara_HG Italian_Tuscan -0.1127
3. Chimp Karelia_HG Samara_HG Remedello_BA -0.1172

Both Remedello_BA and Tuscan are closer to WHG than EHG. But Remedello_BA is significantly closer to WHG as opposed to EHG than Tuscan. EHG ancestry and less WHG ancestry in Tuscan is the most simple explanation. 2+3, show that EHG is closer to EHG when opposed to Remedello_BA than when opposed to Tuscan. It isn't very signifcant, but does suggest EHG has common ancestry with Tuscan it doesn't have with Remedello_BA(even though Remedello_BA is more WHG).

This stat supports the idea there's CHG in Italy.
Remedello_BA Italian_Tuscan Kotais Bulgarian -0.0105

Both Remedello_BA and Tuscan are closer to Bulgarian than to Kotais. However, Remedello_BA is significantly closer to Bulgarian opposed to Kotais than Tuscan is. In overall-makeup Tuscans should be more similar to Bulgarians than Remedello_BA is, since they should share a lot of the same non-EEF ancestry. If I put Bergamo_Italy, very close neighbors/relatives to Tuscans, the score would probably still be negative. I'm going to try that. And if it is negative, we can be very confident there's CHG ancestry in Tuscan. I'll do the same test replacing CHG with EHG.

These stats on the overhand doesn't support CHG ancestry in Tuscan.
Chimp Satsurblia Kotais Italian_Tuscan -0.0906
Chimp Satsurblia Kotais Remedello_BA -0.0828

Instead they suggest CHG has common ancestry with Remedello_BA it doesn't with Tuscan.
 
Results are from this spreadsheet: D-stats with 4-ancestors of Europeans

Below are results from the D-stat(Chimp, Cypriot, Mbuti, European) and D-stat(Chimp, LBK_EN, Mbuti, European). Most of the same Europeans are in each chart. Kosovar, Macedonian, Montenegron, Serbian, and Bosnian in EEF plot are ranked 17-21 but in Cypriot plot are ranked 1-7. South Italy also has a high Cypriot stat but isn't tested for EEF stat.

Neolithic Europeans have an even high stat in the Cypriot plot, however I don't think it can explain high Cypriot and low EEF. Furthermore the moderns highest EEF stat have significantly lower Cypriot stats than Balkans.

The Cypriot stat for EEF and EEF stat for Cypriot makes it clear there's a close relation between EEF and Cypriot. So, to some degree Cypriot stat correlates with EEF, but the Balkan/South Italy thing breaks that correlation.

RankEuropeCypriot
1Kosovar40.71
2Macedonian40.52
3Montenegrin40.59
4Serbian40.56
5Dutch40.51
6Croatian40.48
7Bosnian40.46
8South Italy40.35
9Sardinia40.04
10Bergamo39.99
11Spain_Cantabria39.5
12Albanian39.82
13Greek39.78
14Tuscany39.78
15French_South39.78
16Spain_Aragon39.74
17Spain_Castilla_la_Mancha39.64
18Bulgarian39.61
19English Cornwall39.59
20French39.59
21Norweigain39.51
22Spain Canatabria39.5
23Czech39.45
24Hungarian39.44
25Orcadian39.41
26Estonian39.4
27Ukrainian_East39.37
28EastSicily39.28
29Spain_Andalucia39.27
30BeloaRussian39.26

D(Chimp, LBK_EN, Mbuti, European)
RankEuropeLBK_EN
1Sardinia42
2Bergamo41.46
3Albanian41.27
4Spanish_Aragon41.27
5Tuscany41.19
6Greek41.13
7Spanish_Castilla_la_Mancha41.03
8Croatian40.94
9Bulgarian40.94
10Norweigain40.81
11Czech40.79
12Hungarian40.76
13Ukrainian_East40.64
14EastSicily40.58
15BeloaRussian40.5
16Estonian40.5
17Kosovar40.44
18Macedonian40.25
19Serbian40.24
20Montenegrin40.21
21Bosnian40.12
22Dutch40.07
23Polish39.9

Fire-Haired, I'm not saying this as some form of "I told you so" ego post, but this was clear from the days of Dienekes' calculators, which showed clearly that some southeastern Europeans had more "West Asian" than Sicilians (and this after the ancestry was diluted by the Slavic migrations), and certainly more than than Northern Italians. Italians, on the other hand, like Spaniards, have a higher proportion of EEF or Anatolian farmer related ancestry. (Trust me, I know the difference between West Asian and CHG.)

I've been saying as much for years, but apparently either no one else was paying attention, or it was discounted.

That said it's good to see these other analyses bearing it out.
 
Prove of CHG/EHG in Tuscany and Greece

Significant signal of EHG and CHG in Tuscany. No signal of Cypriot-type stuff.
Remedello_BAItalian_TuscanKotaisItalian_Bergamo-0.0139
Remedello_BAItalian_TuscanKarelia_HGItalian_Bergamo-0.0169
Remedello_BAItalian_TuscanKarelia_HGCypriot-0.0022

Significant signal of EHG and CHG in Greece. No signal of Cypriot-type stuff.
Hungary_ENGreekMacedonianKarelia_HG0.0104
Hungary_ENGreekMacedonianKotais0.0156
Hungary_ENGreekMacedonianCypriot-0.0028

Significant signal of CHG in Spain_Cantabria. Weak signal of EHG and no signal of Cypriot-type stuff.
Spain_MNSpanish_CantabriaSpanish_AndaluciaKarelia_HG0.0072
Spain_MNSpanish_CantabriaSpanish_AndaluciaKotais0.0102
Spain_MNSpanish_CantabriaSpanish_AndaluciaCypriot0.0009

Fairlly weak signal of EHG in Basque_French, no signal of CHG or Cypriot-type stuff. I should use Spain_CA instead of French_South next time, it'll probably confirm EHG and CHG.
Spain_MNBasque_FrenchFrench_SouthKarelia_HG0.0025
Spain_MNBasque_FrenchFrench_SouthKotais-0.0008
Spain_MNBasque_FrenchFrench_SouthCypriot-0.0073

Signal of NW African in Spain_Galicia.
Spanish_AragonSpanish_GaliciaFrench_SouthAlgerian0.0095

I say prove because I used a bullet-proof method to confirm both CHG and EHG in Tuscany and Greece. Eventually I'll use it on dozens of more ethnicity.

This is how the method works. EX: I test whether Copper age_Italy is more close to Bergamo_Italy as opposed to EHG/CHG than Tuscans are. And Copper Age_Italy is significantly closer to Bergamo_Italy opposed to EHG/CHG than Tuscany is. This is despite the fact Bergamo and Tuscany share historical-era ancestry, from the same region, and overall very similar. The reason is Tuscany has common ancestry with EHG/CHG that Copper age_Italy lacks.

See, this method is abosultly bullet-proof. You can't argue it is because EEF is more similar to two modern neighbors are. The only possible explanation is the modern has CHG/EHG ancestry.

So, this isn't coming from ADMIXTURE or PCA. These are irrefutable completely unbiased D-stats. I'll test a large away of West Eurasian pops in the place of EHG/CHG, to test if there are better proxies for non-EEF admixture in South Europe, just to be sure and not assume it was CHG/EHG. I'm very confident EHG/Steppe and Caucasus will come out strongest.
 
EEF/WHG/WHG/Steppe ratios from D-stats

I'm not sure exactly how he did it, but David has found a way to get ancestry percentages with D-stats. It looks like he used a similar strategy that F4-stats do. F4-stats are what were used to get ancient ancestry percentages in Haak 2015 and Laz 2014. David also used 4mix, which is like Oracles at GEDmatch. It measures how how those ancestry percentages can explain the population's behavior in the D-stats. The results are all very good fits and make a lot of sense. No assumptions or bias can change these results.

Anatolia-CHG-WHG-EBA_Steppe.png
 
I'm not sure exactly how he did it, but David has found a way to get ancestry percentages with D-stats. It looks like he used a similar strategy that F4-stats do. F4-stats are what were used to get ancient ancestry percentages in Haak 2015 and Laz 2014. David also used 4mix, which is like Oracles at GEDmatch. It measures how how those ancestry percentages can explain the population's behavior in the D-stats. The results are all very good fits and make a lot of sense. No assumptions or bias can change these results.

Anatolia-CHG-WHG-EBA_Steppe.png

You don't see any problem with these results? 13% Caucasus in Bulgaria, 15% in Greeks? This contradicts every other analysis ever done.

Or, the Scots have 0% CHG? The Norwegians also have 57% Yamnaya compared to 40% for Belorussians?

Does no one apply any critical thinking to this person's work?

Where, also, is the clear and transparent documentation for the method?

My God, are some of you in the market for a bridge? There's one in Brooklyn in which you might be interested.

This is why most "technical" people should be kept as far from the levers of real power as possible. Just babes in the woods.
 
It's all trail and error. The results aren't taken literally, they're taken as trends and suggestions. Some results are goofy but most make perfect sense. The trends make perfect sense. The method is ingenious, accurate, and can't be manipulated.

EDIT: Angela, this method is very useful. Look at this. I chose my own set of outgroups(Dai, Egyptian, Ket, MA1, Georgian, WHG, and Druze). I got results of the form (Chimp, test; Mbuti, outgroup) for CHG, WHG, EHG, EEF, and Yamnaya.

I then attempted to make Yamnaya's scores a 4 way-mix of CHG/WHG/EEF/EHG's scores. Here's the result I got.

Yamnaya= 26% CHG + 19% LBK + 0% WHG + 55% EHG @ D = 0.0053

So, the results for modern Europeans should be taken seriously.

Here's an attempt to make CHG mixed which didn't turn out very well. But actually, the idea is that he's mostly something remotely similar to EEF_like with ANE influence.

CHG=0% EHG + 82% LBK + 0% WHG + 18% MA1 @ D = 0.0291
 
Last edited:

This thread has been viewed 7921 times.

Back
Top