Modelling Admixture with PCA

Fire Haired14

Banned
Messages
2,185
Reaction score
582
Points
0
Y-DNA haplogroup
R1b DF27*
mtDNA haplogroup
U5b2a2b1
I'm opening this thread for postings of admixture proportions from a PCA. I already have a thread for posting admixture proportions from D-stats, here.
 
David Wesolski at Eurogenes, has created a global PCA with 50 dimensions. Normal PCAs only have 2 dimensions. His PCA has so many dimensions it is impossible for our eyes to comprehend it. He posted the locations of all the global samples in all 50 dimensions, to use to produce admixture results.

I've been playing around with Europe, and the results are 100% consistent with everything we've seen before.

Here are some results.
PopulationYamnaya_SamaraLoschbourSweden_MND statistic
Czech514450.0097
Belarusian5513320.0172
Lithuanian5621230.0177
Norwegian497440.0082
Andronovo733240.0105


PopulationCypriotHungary_CAMozabiteYamnaya_SamaraD statistic
Italian_Bergamo10660240.0148
Italian_EastSicilian48372130.0096
Italian_Tuscan30480220.0109
Italian_WestSicilian46372150.0098


PopulationCypriotIberia_ChalcolithicMozabiteYamnaya_SamaraD statistic
Spanish_Andalucia22513240.0081
Spanish_Aragon13601260.0079
Spanish_Baleares20501290.0094
Spanish_Cantabria12581290.0079
Spanish_Castilla_la_Mancha20550250.009
Spanish_Castilla_y_Leon15517270.0091
Spanish_Cataluna15523300.0103
Spanish_Extremadura18486280.0102
Spanish_Galicia18495280.0081
Spanish_Murcia19523260.0078
Spanish_Pais_Vasco0700300.0117
Spanish_Valencia17551270.0086
 
It's also too high for Spaniards, and for the same reason.

Also, this doesn't tell us anything about the gene flows that formed the Spanish and the Italians; there was no migration of "Cypriots" into those countries at the kind of levels which would be required to get these kinds of admixture results.

I'm not even sure that it's accurate for northern Europeans. The Yamnaya percentage might be inflated by high WHG/EHG genes that were absorbed as the actual Yamnaya type people moved across Europe.
 
I'm not even sure that it's accurate for northern Europeans. The Yamnaya percentage might be inflated by high WHG/EHG genes that were absorbed as the actual Yamnaya type people moved across Europe.

Yes, except that WHG in Basques as in Sardinians came mostly with EEF farmers, not Yamnaya people.
 
It's also too high for Spaniards, and for the same reason.

Also, this doesn't tell us anything about the gene flows that formed the Spanish and the Italians; there was no migration of "Cypriots" into those countries at the kind of levels which would be required to get these kinds of admixture results.

I'm not even sure that it's accurate for northern Europeans. The Yamnaya percentage might be inflated by high WHG/EHG genes that were absorbed as the actual Yamnaya type people moved across Europe.

Yamnaya admixture wasn't already studied in a peer review paper as Haak (2015)?

Untitled3.png


David Wesolski at Eurogenes, has created a global PCA with 50 dimensions. Normal PCAs only have 2 dimensions. His PCA has so many dimensions it is impossible for our eyes to comprehend it. He posted the locations of all the global samples in all 50 dimensions, to use to produce admixture results.

I've been playing around with Europe, and the results are 100% consistent with everything we've seen before.

Here are some results.
PopulationYamnaya_SamaraLoschbourSweden_MND statistic
Czech514450.0097
Belarusian5513320.0172
Lithuanian5621230.0177
Norwegian497440.0082
Andronovo733240.0105


(...)

The source? A link?
 
Yamnaya admixture wasn't already studied in a peer review paper as Haak (2015)?

Untitled3.png

Indeed, and by a more sophisticated method than a simple Admixture run.
 
Yamnaya admixture wasn't already studied in a peer review paper as Haak (2015)?

The source? A link?

These results were created by amateurs, using the same tools academics use.

It's also too high for Spaniards, and for the same reason.

Also, this doesn't tell us anything about the gene flows that formed the Spanish and the Italians; there was no migration of "Cypriots" into those countries at the kind of levels which would be required to get these kinds of admixture results.

I'm not even sure that it's accurate for northern Europeans. The Yamnaya percentage might be inflated by high WHG/EHG genes that were absorbed as the actual Yamnaya type people moved across Europe.

Yamnaya has hardly any or no WHG. WHG and EHG behave in significantly differnt ways in PCA and D-stats. Trust me, Iberia Middle Neolithic/Copper age, provides all the WHG Iberians need. Actually it gives too much, which is why they need Cypriot admixture. Cypriot are a proxy for unsampled Near Eastern ancestors of mostly Southern Europe, but also Central Europe. When you see the basically the same results from two completely differnt methods(PCA and D-stats), you got to admit, there's something legitimate about the results.

In 500 BC, if they discovered DNA, someone would have figured out Europeans are mostly Sycthian+Sardinian. It wouldn't mean, Europeans literally had Sycthian and Sardinian ancestry, but that they have ancestors who were similar to Sycthians and Sardinians. We're in a similar situation in 2000 AD, with Cypriots and Southern Europe.

Indeed, and by a more sophisticated method than a simple Admixture run.

They used the exact same method as D-stat admixture. They tested the relatedness of test populations to outgroups, then modeled modern test populations as a mixture of ancient ones. I think they used F-stats instead D-stats(don't know the difference between the two), but it's the same idea.

Haak 2015.
We estimate mixture proportions using a method thatgives unbiased estimates even without an accurate model for the relationships between the
test populations and the outgroup populations (SI9).
 
@Everyone,

There's no doubt Steppe admixture is significant for Spain and Italy. 20-30% from the Pontic-Caspein Steppe, not Ukraine where they had EEF admixture, is reasonable. When you model Spain and Italy, as Steppe+WHG+EEF, you still get 20-30% Steppe, but a bad fit(because they need Cypriot or something similar).
 
@Everyone,

There's no doubt Steppe admixture is significant for Spain and Italy. 20-30% from the Pontic-Caspein Steppe, not Ukraine where they had EEF admixture, is reasonable. When you model Spain and Italy, as Steppe+WHG+EEF, you still get 20-30% Steppe, but a bad fit(because they need Cypriot or something similar).

I have my doubts
 
Here are better results. I used the nMonte system instead of 4mix. nMonte, breaks down percentages to decimals.

Anatolia_NeolithicYamnaya_SamaraLoschbourCypriot@ DCycles
Italian_Tuscan42.625.56.425.50.007143100
Scottish_Argyll28.8539.621.759.80.009478100


IberiaIberia_ChalcolithicBell_Beaker_GermanyCypriot@ D
Spanish_Aragon39.645.914.90.006734
Basque_Spanish61.838.200.018822


ItalyIceman_MNBell_Beaker_GermanyCypriot@ D
Italian_Bergamo29.4548.621.950.008345
Italian_Tuscan1642.2541.750.007458
Italian_WestSicilian12.729.7557.550.010929
Anatolia_NeolithicYamnaya_SamaraLoschbour@ DCycles
Scottish_Argyll36.144.2519.650.009592100
Italian_Tuscan61.4537.5510.008183100
Hungary_BA41.432.925.70.013439100
 
I don't doubt that there is steppe admixture in southern Europe. Just results from yDna indicate that this is the case, and, as has been pointed out, it was found through the extensive modeling presented in Haak et al.

What I'm questioning is the amount, which seems about right for certain groups and pretty off for others (I highly doubt that the Basques are 70% Yamnaya related). I'm also questioning modeling groups using a mixture of ancient and modern genomes. In this particular case, Cypriots have their own particular population history and admixture. Throwing them into the mix may be altering the other percentages in ways we can't really predict.

Why not just wait for aDna? It won't be long.

Oh, and last I heard there are problems with the use and interpretation of Dstats, and we have that from the people who know the most about them, like Patterson. All these methods have their limitations. We have to try to take into account the results from all of the tools at our disposal, not constantly proclaim that this is the definitive tool and these are the only correct conclusions.
 
I don't doubt that there is steppe admixture in southern Europe. Just results from yDna indicate that this is the case, and, as has been pointed out, it was found through the extensive modeling presented in Haak et al.

What I'm questioning is the amount, which seems about right for certain groups and pretty off for others (I highly doubt that the Basques are 70% Yamnaya related). I'm also questioning modeling groups using a mixture of ancient and modern genomes.

Basque came out 30% Yamnaya. BTW, what makes Basque unique in these tests(D-stats, ADMIXTURE, PCA) is they lack the Near Eastern-signal other Iberians and French have. They have as much Steppe admixture.



In this particular case, Cypriots have their own particular population history and admixture. Throwing them into the mix may be altering the other percentages in ways we can't really predict.

In D-stats, Cypriots unique history isn't expressed. This is because there are no Near Eastern-outgroups. Cypriots relationship to non-West Eurasian outgroups, European HGs, and CHG, is basically the same as Anatolia_Neolithic. The only difference is Cypriot isn't as close to EEF outgroups as Anatolia_Neolithic is. So, Cypriot is like EEF but not EEF.

Using simple math, you can suck out a zombie ancestor out of test populations. So, for Georgians and Mozabite I did this. Georgian's non-CHG side in D-stats, behaves similar to Cypriot. Mozabite's non-African side, behaves similar to Cypriot. So, Cypriot in D-stats with the outgroups used, is like a pure Near Eastern. They're not EEF or CHG, and they don't appear to have exotic admixture(African, East Asian, South Asian, Steppe). That's why I use them as a Near Eastern ancestor proxy for South Europe.

In other methods, Lebanese or Turkish work better than Cypriot for South Europe. In D-stats however Cypriots work better.

Why not just wait for aDna? It won't be long.

Oh, and last I heard there are problems with the use and interpretation of Dstats, and we have that from the people who know the most about them, like Patterson. All these methods have their limitations. We have to try to take into account the results from all of the tools at our disposal, not constantly proclaim that this is the definitive tool and these are the only correct conclusions.

I'm impatient. It's been a year since any big ancient DNA paper was published. I've used D-stats, PCA, and ADMIXTURE(not just ANE K8, but also old ones based on moderns), and have gotten the same results.
 
Basque came out 30% Yamnaya. BTW, what makes Basque unique in these tests(D-stats, ADMIXTURE, PCA) is they lack the Near Eastern-signal other Iberians and French have. They have as much Steppe admixture.





In D-stats, Cypriots unique history isn't expressed. This is because there are no Near Eastern-outgroups. Cypriots relationship to non-West Eurasian outgroups, European HGs, and CHG, is basically the same as Anatolia_Neolithic. The only difference is Cypriot isn't as close to EEF outgroups as Anatolia_Neolithic is. So, Cypriot is like EEF but not EEF.

Using simple math, you can suck out a zombie ancestor out of test populations. So, for Georgians and Mozabite I did this. Georgian's non-CHG side in D-stats, behaves similar to Cypriot. Mozabite's non-African side, behaves similar to Cypriot. So, Cypriot in D-stats with the outgroups used, is like a pure Near Eastern. They're not EEF or CHG, and they don't appear to have exotic admixture(African, East Asian, South Asian, Steppe). That's why I use them as a Near Eastern ancestor proxy for South Europe.

In other methods, Lebanese or Turkish work better than Cypriot for South Europe. In D-stats however Cypriots work better.



I'm impatient. It's been a year since any big ancient DNA paper was published. I've used D-stats, PCA, and ADMIXTURE(not just ANE K8, but also old ones based on moderns), and have gotten the same results.

was'nt there a recent paper indicating the Balkan E haplogroup origins as Cyprus and they had a lot of EEF ................IIRC , these 2 major E Haplogroups came into cyprus vis the northern levant
 
I don't doubt that there is steppe admixture in southern Europe. Just results from yDna indicate that this is the case, and, as has been pointed out, it was found through the extensive modeling presented in Haak et al.

What I'm questioning is the amount, which seems about right for certain groups and pretty off for others (I highly doubt that the Basques are 70% Yamnaya related). I'm also questioning modeling groups using a mixture of ancient and modern genomes. In this particular case, Cypriots have their own particular population history and admixture. Throwing them into the mix may be altering the other percentages in ways we can't really predict.

Why not just wait for aDna? It won't be long.

Oh, and last I heard there are problems with the use and interpretation of Dstats, and we have that from the people who know the most about them, like Patterson. All these methods have their limitations. We have to try to take into account the results from all of the tools at our disposal, not constantly proclaim that this is the definitive tool and these are the only correct conclusions.


I agree. All these methods give us, internally, some respective weights of "componants" between populations, not absolute %s, these last ones varying in every run.. And as you I dont like too much thses groupings of different ages reference populations. The results, taken with caution, can tell something about global ancient affinities but very little about History of populations moves (who. when?).
 
I 'm not so astonished by the respectable supposed weight of "Yamnaya" or "steppic" among Basques (and Iberians at a lower level). Looking at diverse studies admixtures and PCA's and Y-haplos, I see some imput come from North in Iberia, even if some proximity can be explained by the partly remaining WHG imput, not so completely different from a element (HG) among "steppic" people. But I admit I don't manage too well all these D-stats and Co...
 


I agree. All these methods give us, internally, some respective weights of "componants" between populations, not absolute %s, these last ones varying in every run.. And as you I dont like too much thses groupings of different ages reference populations.

The reason the scores change is the differnt ancestors are made up of the same older components. Cypriot and EEF are made up of a lot of the same stuff. Cypriot can swallow some EEF and vice versa. I explained to Anegla why in D-stats(not PCA), Cypriot is a good Near Eastern proxy. Their unique recent history has no affect.

The results, taken with caution, can tell something about global ancient affinities but very little about History of populations moves (who. when?).

You've got to understand the genetic diversity in West Eurasia 10,000 years ago was much greater than it is today. Modern day Europeans are all a fusion of the same Holocene era populations. We can differentiate differnt ethnic ancestries in Europe today, but no population in Europe has accumulated 1,000s of years of unique evolution, like Holocene era West Eurasians had.

In these tests, the main sources of diversity in Europe, are differnt ratios of ancestry from the same Holocene era populations. Local ethnic and region ancestry has hardly any affect. There's really no difference between English and Czech or Tuscan and Spanish, even though they have no ethnic or regional connections. We can't identify the exact ethnic or regional origin, of the people who migrated into Spain and Italy after 3000 BC. However, with genomes from Spain and Italy dating 3000 BC, we can get a good idea what large genetic groupings the people migrated into both places after 3000 BC were apart of. We can't get ethnic/regional/cultural/etc information, because it makes a small affect on DNA.
 
Look at these results. This is a totally unbiased approach, to find what changed in Spain, Italy, Greece, Norway, and Turkey after 3000 BC. I used people who lived in their regions in 3000 BC, other Europeans from 3000 BC(Yamnaya), and all modern non-Europeans as possible ancestors. For Italy and Spain, I only used moderns from the same regions as the 3000 BC genomes are from.

Iceman_MNYamnaya_SamaraAdygeiCypriotMozabiteBedouinBDruze@ D
Bergamo60.8527.45.4542.3000.011837
Tuscan42.925.7030.30.30.150.650.009941

Norway.
Sweden_MNYamnaya_SamaraChechenMozabiteBedouinB@ D
Norway47.152.150.30.250.20.011449

Spain.
Iberia_ChalcolithicYamnaya_SamaraAdygeiCypriotMozabiteChechenDruze@ D
Cantabria59.0528.12.558.251.250.150.650.007823

Greece.
Anatolia_NeolithicYamnaya_SamaraAdygeiCypriotDruzeAbkhasianArmenianDruzeGeorgianKotiasLebanesePalestinian
Greek52.736.352.757.650.55000.5500000.007823
Turkish25.2000013.254.58.327.116.851.153.650.01017
 
Look at these results. This is a totally unbiased approach, to find what changed in Spain, Italy, Greece, Norway, and Turkey after 3000 BC. I used people who lived in their regions in 3000 BC, other Europeans from 3000 BC(Yamnaya), and all modern non-Europeans as possible ancestors. For Italy and Spain, I only used moderns from the same regions as the 3000 BC genomes are from.

Iceman_MNYamnaya_SamaraAdygeiCypriotMozabiteBedouinBDruze@ D
Bergamo60.8527.45.4542.3000.011837
Tuscan42.925.7030.30.30.150.650.009941

Norway.
Sweden_MNYamnaya_SamaraChechenMozabiteBedouinB@ D
Norway47.152.150.30.250.20.011449

Spain.
Iberia_ChalcolithicYamnaya_SamaraAdygeiCypriotMozabiteChechenDruze@ D
Cantabria59.0528.12.558.251.250.150.650.007823

Greece.
Anatolia_NeolithicYamnaya_SamaraAdygeiCypriotDruzeAbkhasianArmenianDruzeGeorgianKotiasLebanesePalestinian
Greek52.736.352.757.650.55000.5500000.007823
Turkish25.2000013.254.58.327.116.851.153.650.01017


clearly this makes sense.........just looking at Bergamo and Tuscan you can see the North-Caucasus association with Bergamo and the cypriot association with the Tuscans

good job
 

This thread has been viewed 22240 times.

Back
Top