The spread of 'Steppe' DNA and autosomal best-fit analysis

What markod says. Also, ADMIXTURE is a quite rough approach to genetic mixture, especially if you mix older populations with modern day populations. Also, to fully appreciate what it states you need to weigh in other K values as well. ADMIXTURE goes a bit like this: When *forced* to be modeled as combination of two of the provided samples how would they look (That is K=2), and when *forced* to be modeled as a combination of three samples (K=3), etc etc. Afterwards the statistically best fit is chosen, if run in an unsupervised mode. Even in the unsupervised case a lot of individual samples will simply be a forced bad fit.

The tools that Reich labs created (f3stats, D-stats and ADMIXTURE) are available but rather complicated to use. Also, you'd probably need the full samples which take up a huge amount of disk space for those, but if you choose that path and are willing to experience a steep learning curve you can download it here.

Eurogenes Davidski provides a simpler tool called nMonte, created by Huijbrechts, (more explanation here) that allows modeling on the basis of pre-calculated PCA values. It's conclusions are consistently similar to qpAdm of the Reich lab tools. You have to install "R", though.
Thanks for this.
Does it produce substantially different results, do you know? And if so, what are they?
The striking thing about the autosomal analysis I have carried out is that it provides pretty similar results to yDNA and mtDNA analysis that I have undertaken using different methodologies.
 
If you can replicate that with D-stats or f3stat or qpAdm, yes. But it could also simply be a artifact.
Yes, all is speculative and could be artifact. We have a limited range of data, and can never in any case know for certain who bred with whom (and where) several thousand years ago.
 
Thanks for this.
Does it produce substantially different results, do you know? And if so, what are they?
The striking thing about the autosomal analysis I have carried out is that it provides pretty similar results to yDNA and mtDNA analysis that I have undertaken using different methodologies.

I don't know, I do know that the 30% EEF + 70% Yamnaya for Corded Ware pops up in many different approaches. but do take into consideration that more than one model may fit as different proposed source population are themselves related to each other. For instance, Khvalynsk is often considered partial ancestor of Yamnaya and thus may mob up lots of Yamnaya ancestry.
 
I don't know, I do know that the 30% EEF + 70% Yamnaya for Corded Ware pops up in many different approaches. but do take into consideration that more than one model may fit as different proposed source population are themselves related to each other. For instance, Khvalynsk is often considered partial ancestor of Yamnaya and thus may mob up lots of Yamnaya ancestry.
I did calculate the fit of all possible models, including potential source populations that are related to each other and combinations of these source populations. Khvalynsk provided a fit for early Steppe DNA in EC Europe that was substantially closer than its Yamnayan successor or indeed a mixture of the two.

As Khvalynsk evolved into Yamnaya, its autosomal changes moved it further away from the Steppe DNA that we see in core European populations. Similarly with EEF, the further we move from Anatolia towards the Corded Ware zone, the worse the fit that we see with the EEF in Corded Ware populations, suggesting that the EEF component in CW came from elsewhere.

The North Ukrainian R1a-M417 sample dated circa 4,000 BC provides the most striking evidence - it already had both the yDNA and autosomal mix typical of Corded Ware before Yamnaya had even come into existence. Its descendants did not need any Yamnayan admixture to turn them into what it seems they already were.

Unless there is evidence to indicate and explain why these best-fit results are likely to be incorrect, I would provisionally tend to go with them, rather than with possibilities that provide worse fits. Of course, we are only looking at a limited number of clusters of archaeological samples, and the real lineages of Chalcolithic populations are highly likely to be with people and communities for which we have no samples or data.
 
Unless there is evidence to indicate and explain why these best-fit results are likely to be incorrect, I would provisionally tend to go with them, rather than with possibilities that provide worse fits. Of course, we are only looking at a limited number of clusters of archaeological samples, and the real lineages of Chalcolithic populations are highly likely to be with people and communities for which we have no samples or data.

ADMIXTURE is not a good source and any model you get from it the way you do is pretty much useless unless replicated with another method. The big papers basically do an ADMIXTURE, verify with f3 or D-stats, and model with qpAdm. You may try and use your method and verify with nMonte, if you insist. Also, what you call a dataset is not that. It is basically *one* ADMIXTURE-run.

EDIT: One of the reasons ADMIXTURE is not very reliable the way you use it is, it uses Fst (genetic distance). That is fine, but differences in calculated genetic distance can be caused by more than simple ancestry. For instance, two populations (A and B) both highly drifted from a population C will show a higher Fst from C each than a population resulting from a merge of A and B. That is because unique drift will be leveled out once A an B merge, as both having been separated as long as they have been separated from C, have some genes than weren't drifted in A merging with genes drifted in B. This is why current day Europeans have a lower Fst from Africans than any of its ancestors have. However, check with D-stats and no African population will choose current day Europeans over any of its ancestors, which is a clear sign that the difference in Fst is not from extra African ancestry.
 
Yes, it could be; but, from the archaeological samples available, there were no fits so close as the ones I've identified. The matches with other core Pontic-Caspian samples like Sredny Stog and Yamnaya provide more divergent results; and the matches with EEF further away from the Bosphorus likewise.

The best-fit results actually match up well with what we know about the Steppe-like Suvorovo culture, which appears to have spread from Eastern Bulgaria between 4,300 and 4,000 BC in various directions northwards - to the Danube delta, and then (i) up the Danube into Northern Romania, (ii) up the Dniester into North Western Ukraine and (iii) up the Dnieper into East Central Ukraine.

I see, but, I mean, how likely is it that a R1b BB and a R1a CWC from, say, around 2500 BC would still have a very close match with a 5th millennium Bulgarian individual even after so intensive migrations, cultural changes and certainly lots of mixing with the local peoples (who, especially in the case of BB, they didn't seem to replace overwhelmingly)? Would very little mixing and autosomal change have happened in more than 1500 years even as the Bulgarian Suvorovo spread to lands very far away and already densely inhabited? I'd be very surprised if that did happen. I think the fact that the BB and CWC do not match as well as with the Sredny Stog and the Yamnaya samples may just result from the very likely and plausible fact that those were still "steppe proper", pre-expansion societies with much less admixture with the ANF+WHG mix dominant especially to the west of the Dniester.

I've seen some suggestion previously that Cernavoda, under Suvorovo-Novodanilovka influence, may have had some role in the spread of IE languages, but its very early dating and split from the steppe "proper" genetic/cultural horizon made people speculate it could have something to do with the Anatolian IE languages, because the non-Anatolian Late PIE stage has usually been assumed to have started splitting much later, around 3400-3000 BC. But they could be wrong, of course...
 
Never cite that quack Quiles. The crackpot believes that Corded Ware was Uralic; ridiculous. How about Yamnaya being Vasconic/Northwest Caucasian? See, I can come up with crank theories too!

It wouldn't be that huge a problem if he didn't also think that CWC had come from the Pontic-Caspian steppe (early Sredny Stog) where, just before the Neolithization of the region around 5000-4500 BC (approximate dates), PU and PIE would've formed a common homogeneous Indo-Uralic language that split later into the Khvalynsk PIE and Sredny Stog PU. That is, he really believes that PU and PIE were basically separated by just 1000-1500 years of linguistic divergence when they themselves started to split to form their own language families. :-o
 
Does he actually? That is really dumb, how can he believe that given the blatant correlation with Y DNA N1c?

Believe it or not, he thinks N1c and Siberian ancestry have nothing to do with the PU expansion. They're just later absorptions that took place in some Uralic-speaking areas, little more than a faint correlation. On the other hand, he thinks there is a strong correlation between PU and CWC and R1a-M417 (including the "Indo-Iranian" Z93, which according to him makes the Proto-Indo-Iranian community a mix of Yamnaya-derived PIE with Uralic CWC). It's a bit strange, given his clear knowledge about all the papers, that this really basic reasoning was missed by him: 1) with the exception of the clear outliers of the PU family (with a very "unusual" history, too), the Hungarians, all Uralic nations are rich in N1c or at least N1 and have at least some minor Siberian ancestry, but very few non-Uralic populations in Europe have a high frequency of N1, and all of them are neighbors to Uralic nations (what a coincidence); 2) and that CWC is found in heavy proportions in virtually all Uralic nations (Nganasans excluded), but CWC is also found in heavy or actually even heavier proportions in several non-Uralic nations, whereas Siberian ancestry is clearly found in stronger proportions in the Uralic nations than in other nations (even the N1-rich IE people like the Lithuanians have virtually no Siberian ancestry).
 
It wouldn't be that huge a problem if he didn't also think that CWC had come from the Pontic-Caspian steppe (early Sredny Stog) where, just before the Neolithization of the region around 5000-4500 BC (approximate dates), PU and PIE would've formed a common homogeneous Indo-Uralic language that split later into the Khvalynsk PIE and Sredny Stog PU. That is, he really believes that PU and PIE were basically separated by just 1000-1500 years of linguistic divergence when they themselves started to split to form their own language families. :-o

I believe the traditional argument that had Uralic & IE bordering each other on the steppe usually involved the weird IE forms in PU (*nimi– , *weti– etc.).

How can these be explained if Uralic expanded from a homeland presumably in the vicinity of Mongolia (under the Seima Turbino hypothesis)? They don't seem like words that would diffuse through trade or the like.
 
ADMIXTURE is not a good source and any model you get from it the way you do is pretty much useless unless replicated with another method. The big papers basically do an ADMIXTURE, verify with f3 or D-stats, and model with qpAdm. You may try and use your method and verify with nMonte, if you insist. Also, what you call a dataset is not that. It is basically *one* ADMIXTURE-run.
I think 'useless' is exaggerated, particularly as it is often verified with f3 or D-stats. In the absence of evidence to the contrary, it is at least better than nothing. It may be one admixture run, but it is a dataset in that the data has been obtained from many samples.

EDIT: One of the reasons ADMIXTURE is not very reliable the way you use it is, it uses Fst (genetic distance). That is fine, but differences in calculated genetic distance can be caused by more than simple ancestry. For instance, two populations (A and B) both highly drifted from a population C will show a higher Fst from C each than a population resulting from a merge of A and B. That is because unique drift will be leveled out once A an B merge, as both having been separated as long as they have been separated from C, have some genes than weren't drifted in A merging with genes drifted in B. This is why current day Europeans have a lower Fst from Africans than any of its ancestors have. However, check with D-stats and no African population will choose current day Europeans over any of its ancestors, which is a clear sign that the difference in Fst is not from extra African ancestry.
If I understand you correctly - if C's descendants A and B later merge, rather than admix with other populations, then they are likely to show a greater proportional descent from C. This is what I am measuring, rather than separation times.
 
I see, but, I mean, how likely is it that a R1b BB and a R1a CWC from, say, around 2500 BC would still have a very close match with a 5th millennium Bulgarian individual even after so intensive migrations, cultural changes and certainly lots of mixing with the local peoples (who, especially in the case of BB, they didn't seem to replace overwhelmingly)? Would very little mixing and autosomal change have happened in more than 1500 years even as the Bulgarian Suvorovo spread to lands very far away and already densely inhabited? I'd be very surprised if that did happen. I think the fact that the BB and CWC do not match as well as with the Sredny Stog and the Yamnaya samples may just result from the very likely and plausible fact that those were still "steppe proper", pre-expansion societies with much less admixture with the ANF+WHG mix dominant especially to the west of the Dniester.

We do not know how much people admixed until we examine the data. If we look at the R1a-M417 sample from circa 4,000 BC, for example, it differs little autosomally from the R1a-M417 samples in Corded Ware 1,500 years later. I would suggest this indicates that extant M417 admixed very little during this period, just as its females seemed to admix almost exclusively with M417 men. This is surely a more likely explanation than that it changed autosomally through admixture with Sredny Stog and Yamnaya before changing back to pretty much the same autosomal mix that it had before.

Sredny Stog and Yamnaya look to me like red herrings. They had different autosomal mixes from each other, and neither of their Steppe DNA mixes match the mixes in the Steppe components within BB or CW.
 
I think 'useless' is exaggerated, particularly as it is often verified with f3 or D-stats. In the absence of evidence to the contrary, it is at least better than nothing. It may be one admixture run, but it is a dataset in that the data has been obtained from many samples.

The way you use it is pretty much useless. The percentages you get aren't ancestry, they are distances in a forced fit. By the way, there are plenty of ADMIXTURE runs in different papers. Try some of these and see if the results are the same.

If I understand you correctly - if C's descendants A and B later merge, rather than admix with other populations, then they are likely to show a greater proportional descent from C. This is what I am measuring, rather than separation times.

A and B are full 100% unadmixted descendants of C. However, Measuring with Fst only one will consider a merged A+B closer to C than either A or B.
 
The way you use it is pretty much useless. The percentages you get aren't ancestry, they are distances in a forced fit. By the way, there are plenty of ADMIXTURE runs in different papers. Try some of these and see if the results are the same.
I could do, but usually the ranges in these papers are less extensive, there are fewer components, and there seems little point in repeating a laborious exercise if people consider it useless.

A and B are full 100% unadmixted descendants of C. However, Measuring with Fst only one will consider a merged A+B closer to C than either A or B.
If Bell Beaker and Corded Ware were A and B (both principally descended from Yamnayan C) that ended up coming back together in Germany, this would make German samples on Admixture appear closer to Yamnaya than they are to other BB or CW. However, exactly the opposite is the case - all such samples are significantly closer to each other than to Yamnaya in respect of all inter-component relationships. This would suggest that Yamnayans must be even less close ancestrally to BB and CW than my results estimate.

If there are other studies which show that Yamnaya has a better fit with BB and CW than Suvorovo or Khvalynsk does, then this would make me less confident that the spread of Steppe DNA occurred before Yamnaya arose. In the absence of such studies, it makes more sense to me to go with the clear results obtained from the best fit analysis - that BB and CW each look Suvorovo/East Balkan in origin. Not only does this tie up with my yDNA branching estimates (that the splits between Eastern and Western branches of R1b-L23 and R1a-M417 preceded the Yamnayan westward expansions), but it would also explain what happened to all the DNA from the Suvorovo/early Steppe people that we know had spread far and wide over an area encompassing Hungary, Romania, Bulgaria, Moldova and Eastern Ukraine by 4,000 BC (and even, it appears, North Central Spain by the mid fourth millennium BC).
 
For what it's worth, I ran best-fit calculations for some other early Steppe component populations:
1. The Steppe component in Northern Funnel Beaker comes out as early/indigenous and unrelated to Suvorovo or Yamnaya.
2. Balkan Yamnaya comes out as unrelated to Suvorovo/Corded Ware/Bell Beaker and best fits with an admixture between various Eastern peoples (Russian Yamnaya, Maykop and Dereivka) with Cucuteni Tripolye.

My reading of the autosomal data is as follows:
Early Scandinavian Steppe-like DNA most likely arrived during the Mesolithic with basal clades of R1a.
Khvalynsk-like people of mixed yDNA (Suvorovo) arrived in the Western Pontic and admixed with locals to form proto-BB/CW/Vucedol. Corded Ware admixed little, Bell Beaker was the product of some admixture with Globular Amphora, and Vucedol with various Southern European EEF.
About 1,500 years later, another Khvalynsk-descendant population (Yamnaya) moved westwards into the Balkans, together with some Maykop and Dereivka, and admixed with Cucuteni-Tripolye people. Yamnaya was probably the catalyst, as the resulting culture was theirs and they also seem to have moved into Corded Ware territory in the Eastern Baltic. These moves displaced Corded Ware westwards, which in turn displaced Bell Beaker further westwards.

I would be interested to hear of any evidence that might lead me to revise or refine these tentative estimates.
 
It is about autosomal or Y-ADN?
I think, it's a very complex process like one of hydrodynamics, with little data about the ancient DNA. Y-DNA is too little of the genetic code. Ancient cultures could spread from one population to another without massive population movements, the same as Y-dna, almost without to afect autosomal DNA.
 
I could do, but usually the ranges in these papers are less extensive, there are fewer components, and there seems little point in repeating a laborious exercise if people consider it useless.

Frankly, I respect the effort you put into this. You might want to take the steep learning curve and learn how to use the Reich labs tools or ADMIXTURE yourself. The datasets are mostly freely available. But this is simply not going to fly. I might be wrong in this but then prudence would dictate that you at least verify it with other ADMIXTURE runs to show that these produce similar results.


If Bell Beaker and Corded Ware were A and B (both principally descended from Yamnayan C) that ended up coming back together in Germany, this would make German samples on Admixture appear closer to Yamnaya than they are to other BB or CW. However, exactly the opposite is the case - all such samples are significantly closer to each other than to Yamnaya in respect of all inter-component relationships. This would suggest that Yamnayans must be even less close ancestrally to BB and CW than my results estimate.

They all have picked up farmer ancestry, which is highly related.

If there are other studies which show that Yamnaya has a better fit with BB and CW than Suvorovo or Khvalynsk does, then this would make me less confident that the spread of Steppe DNA occurred before Yamnaya arose. In the absence of such studies, it makes more sense to me to go with the clear results obtained from the best fit analysis - that BB and CW each look Suvorovo/East Balkan in origin.

No, they look more East Balkans in one of the ADMIXTURE runs that Gentiker made. The K=14 run.

Not only does this tie up with my yDNA branching estimates (that the splits between Eastern and Western branches of R1b-L23 and R1a-M417 preceded the Yamnayan westward expansions), but it would also explain what happened to all the DNA from the Suvorovo/early Steppe people that we know had spread far and wide over an area encompassing Hungary, Romania, Bulgaria, Moldova and Eastern Ukraine by 4,000 BC (and even, it appears, North Central Spain by the mid fourth millennium BC).
 
@Pip

Two notes: Models are just proving that a certain ancestry is genetically feasable, they aren't definitive proof. Also, the more methods you use the more evidence you muster.

Take as a lesson for instance how amateurs started to notice that admixture runs showed some European admixture in American Indians. It was replicated in Treemix. However, it turned out that this was shared ancient ancestry which we now call ANE. Only with a combination of genetics, archaeology and common sense you can rebuild an ancestry tree this deep.
 
Frankly, I respect the effort you put into this. You might want to take the steep learning curve and learn how to use the Reich labs tools or ADMIXTURE yourself. The datasets are mostly freely available. But this is simply not going to fly. I might be wrong in this but then prudence would dictate that you at least verify it with other ADMIXTURE runs to show that these produce similar results.
I don't think I really have the time, and am in any case suspicious of admixture, which is easier to manipulate than yDNA. I was only really looking at it to see whether it matches what yDNA seems to indicate, and to help fill in the gaps caused by bottlenecks in yDNA.

They all have picked up farmer ancestry, which is highly related.
The point is that it only seems to be Khvalynsk admixed with this EEF (and not Yamnaya admixed with EEF) that provides close fits with Bell Beaker and Corded Ware.

No, they look more East Balkans in one of the ADMIXTURE runs that Gentiker made. The K=14 run.
I don't know what they look like in the other admixture runs. I use the K=14, as that is where Genetiker publishes the most extensive database.
 
@Pip

Two notes: Models are just proving that a certain ancestry is genetically feasable, they aren't definitive proof. Also, the more methods you use the more evidence you muster.
Agreed. If people think there will ever be definitive proof about exactly who bred with whom and where they were located when they did, they will ultimately be disappointed. I'm just looking at what is feasible, given the data, and what is infeasible. From the data I have seen, I would say that the Yamnaya (as generally understood) are only a remote possibility as the major source of pan-European Steppe DNA, and not what I would call a credible one.

The gaps I was keen to fill were between extant Western and Eastern branches of R1b-M269 and R1a-M417, which I estimate to have split during the 5th millennium BC. Most of the major branches of these haplogroups seem to coalesce to either Western Europe or the Caucasus, so what was the substantial path between the two? My combination SNP/STR analysis indicates Poland or North Western Ukraine as the slightly most likely overall origin point, with an eastwards branching shortly afterwards; whereas Genetiker's K=14 autosomal data indicates something similar, but in reverse.

For R1b-M269 the most likely major route to me now looks something like this - Azov split mid-fifth millennium BC.
Azov>W Pontic>Moldova>Poland>S/C Germany>N France (R1b Bell Beaker expansion point)
Azov>Steppe Maykop>Caucasus>NE Turkey/Armenia

For R1a-M417, the most likely major route looks like - North Ukraine split circa 4,000 BC (Corded Ware expansion point some time later)
Azov>W Pontic>North Ukraine>S Baltic>Scandinavia
Azov>W Pontic>North Ukraine>E Baltic>Poland>E Europe
Azov>W Pontic>North Ukraine>E/C Europe>Caspian>Middle East/N India

Other branches look largely to have shrivelled and died out.

I am interested in any data that might cause these estimates to be revised or refined.
 
I might be wrong in this but then prudence would dictate that you at least verify it with other ADMIXTURE runs to show that these produce similar results.
I cannot find other published admixture databases that have the same extensive coverage as Genetiker's K=14 (including over 3,000 samples). He has also compiled K=16 etc., but with insufficient samples and breakdowns to replicate my K=14 best-fit tests, so I will have to leave them as they stand.
 

This thread has been viewed 105155 times.

Back
Top