More precise R1b subclade estimates using Nordtvedt's methodology

sparkey

Great Adventurer
Messages
2,250
Reaction score
352
Points
0
Location
California
Ethnic group
3/4 Colonial American, 1/8 Cornish, 1/8 Welsh
Y-DNA haplogroup
I2c1 PF3892+ (Swiss)
mtDNA haplogroup
U4a (Cornish)
We all know that Ken Nordtvedt has been refining his STR dating estimates to get more precise estimates of Haplogroup I subclades. His latest, Generations6, has narrowed the error bars even more. But what happens when we apply the exact same methodology to get estimates for the R1b subclades? Mike Walsh tries it...

Mike Walsh said:
U106 & P312 TMRCA Age____4.6 __ (5.9-3.2)
U106 & P312 Nested Age___4.5 __ (5.2-3.8)
(Edit)

U106 TMRCA Age___________3.7 __ (4.0-3.4)
U106 Coalescence Age_____3.4 __ (3.7-3.1)

P312 TMRCA Age___________4.1 __ (4.4-3.8)
P312 Coalescence Age_____3.7 __ (4.0-3.5)


Notice the very small error range [5.2 to 3.8 ybp] for the U106 and P312's Most Recent Common Ancestor (MRCA) using Ken's "Nested Variance" method. Essentially this is a refinement of the non-nested age estimate directly above it. The best estimate for the MRCA of P312 and U106 is 4.5 ypb (2500 BC.) This MRCA would be an R-L11* person but this is NOT the TMRCA for R-L11. Essentially it puts a lower limit on L11's TMRCA, or at least the 5.2 to 3.8 ybp range does. Please read and understand Ken's method. http://knordtvedt.ho...20Variances.pdf

So, what are the implications, Eupedia? Barring a systematic error, this seems to confirm some of our suspicions.
 
Basically, that European R1b is fairly recent. Good sparkey ;)
 
I am not so sure that it is all so recent...

Increased Resolution Within Y-Chromosome Haplogroup R1b M269 Sheds Light On The Neolithic Transition In Europe

George Busby et al.

Early studies on classical polymorphisms have largely been vindicated by the growing tome of information on the genetic structure of European populations, with mtDNA, Y-Chromosome and autosomal markers all combining to give a fundamental pattern of migration from the East. The processes behind this pattern are however, less clear, particularly with regard to uniparental markers. Much debate still rages about how best to use Y and mtDNA to date particular historical movements, or indeed if it is appropriate at all. For example, whilst some progress has been made recently in calibrating the mtDNA clock, the selection of a mutation rate with which to date the Y-Chromosome is contentious, as the two most favoured values can give dates that differ by a factor of three. In order to address this we have investigated the sub-lineages of the common European haplogroup R1b-M269. This haplogroup has been shown to be clinal in Europe, and more recently has been posited to be the result of the Neolithic expansion from the Near East. Here, we use newly characterised SNPs downstream of M269 to produce a refined picture of the haplogroup in Europe, and further show that the diversity of this lineage cannot be entirely attributed to Neolithic migration out of Anatolia. We use simple coalescent simulations to estimate an absolute lower bound for the age of the sub-haplogoups. Rather than originating with the farmers from the East, we suggest that the sub-structure of R1b-M269 visible in Europe today, and thus the great majority of European paternal ancestry, is the result of the interaction between the Neolithic wave of expansion and populations of early Europeans already present in the path of the wave.

Also, I am with Dienekes on boycotting Y-STRs...they are a quagmire.
 
Last edited:
I am not so sure that it is all so recent...

I don't get why Busby is as useful here. Nordtvedt's methodology is more refined and more informed. In fact, it pushes the ball the opposite direction vs. what Busby did in the 2011 follow up. This is new, applying Nordtvedt's Generations6 to R1b, reducing the error bars by a few centuries.

When do you place the entrance of R1b-L11 into Europe?

Also, I am with Dienekes on boycotting Y-STRs...they are a quagmire.

OK, but why? Dienekes' criticism of STR dating has bordered on stupid. Take this:

Dienekes said:
I like the line about there being substantially more Y-STR variation in E1b1a7a-U174 and E1b1ba8-U175 in the Bahamas than any African collection. I have argued for years that the central assumption of phylogeography, that the location of highest Y-STR diversity is not necessarily the point of origin of a haplogroup, since Y-STR diversity can be affected both by antiquity and by admixture. Nonetheless, I keep reading papers where tiny differences in Y-STR variation, even if we forget about the noisiness of Y-STRs themselves, are taken as evidence of ancient migrations. Thankfully, the time when Y-STRs were used to infer ancient migrations is over, and the huge collection of Y-STR haplotypes amassed by population geneticists, forensic specialists, and genealogists alike can be put to uses for which it is more amenable.

...seriously Dienekes? As if we would ever expect an area colonized in the Modern Age to have the same STR diversity rules as ancient populations! There are a few assumptions that we make with STR diversity analyses that hold with ancient migrations, but not the Bahamas... like geographic isolation and smaller populations.

Hopefully you have a better argument against using STRs? I mean, in this case, it's even informed by SNPs, and correlates to ancient DNA.
 
Very, very interesting! If L11 is from around 2,500 BC, that would place it right at the time of the Beakers reaching the Lower Danube:
attachment.php

From there, they could have migrated north and brought what would later be U106 to Northern Europe. It is also worth noting that the "stelae people" hypothesis would also be hard to sustain if these estimates hold.
 
Nordtvedt's full response to Busby (after it was brought up by Mike Walsh interestingly):

Ken Nordtvedt said:
"Contrary to common belief, estimates of ASD, and therefore T, vary widely when different subsets of STRs are used with the same sample."

[[ They should be a bit more careful with their choice of adjectives. If you divide up the STRs into subsets, certainly you will get more variation in tmrca estimates; the statistical confidence interval blows up as you try to rely on fewer and fewer independent mutational experiments through the tree. Their key figure 4, I think, shows a trend of larger tmrcas when using subsets of slower and slower STRs. But 1) this just confirms what hobbyists found out in the last few years, and 2) their graph has absurdly tiny or misrepresented error bars associated with each point on the graph; so the trend for U106/P312 node is hardly more than the the realistic statistical error bars for the graph points. 3) This trend in tmrcas is believed to be associated with the fastest STRs "saturating" by having changing mutation rates for up and down mutations such that variance grows slower than linear when two steps of mutation start to build up in the final haplotypes. This trend therefore gets fractionally weaker and weaker as younger tmrcas are estimated. 4) It is erroneous in any event to include the faster STRs without downweighting, because their variance of variance begins to grow quadratically in age rather than linear in age sooner. 5) I do not use multi-copy STRs in variance estimates because of the difficulty (actually complexkity) of counting normal growth of variance for them, and also for the frequency of recLOHs.

For tmrcas of 4000 or 5000 years and younger, I eliminate the multi-copy markers and properly weight the fast versus slow markers (which downweights fast markers), but otherwise do not throw out STRs. I believe the statistical confidence intervals dominate in this era, and especially when we throw in our uncertainties and biases in the marker mutation rates, especially the slow markers. Fortunately biased mutation rates do not ruin relative tmrca estimates, only the confidence in their absolute sizes.

Key tmrca differences rather than absolute tmrcas are the new frontier in my view. For example: There is the age of the L11 MRCA. There is the MRCA node age ancestral to both U106 and P312. There are the coalescence ages for U106 and P312. While each of these ages are in the 4000 or greater year range, their differences could be quite small, perhaps a few centuries. By properly combining key variances these age differences can be estimated with much, much higher absolute precision (in generations) than can the separate tmrcas. Increasingly people are trying to find snps and sort out what is going on during this era of rapid population expansions and resulting bushiness of the tree with nodes and discovered snps closer and closer together in time. These new tools of evaluating key combinations of variances or GDs and converting those properties into time estimates will be the way to go.

But you really can't dramatically reduce statistical tmrca errors by just enlarging haplotype populations. Those errors go like 1 divided by square root of total STR mutation rate for young tmrcas, and for older tmrcas like 1 divided by square root of number of STRs used in haplotypes. So you try to do tmrcas with just a few STRs at your peril, regardless of haplotype counts. KN ]]
 
Didn't the Beakers spread Indo-European to the West?
 
Didn't the Beakers spread Indo-European to the West?

That's far from settled, and in fact, it mismatches the apparent direction of Beaker Culture spread (West to East), at least if we go by pottery. One hypothesis that's gained some popularity on this forum is that the Beakers, although pre-IE, acted as a catalyst for the spread of IE from its easternmost extreme.

One thing I'm noticing here is that, although the U106 and P312 Nested Age seems to be narrowing on the Beaker time period somewhat, the TMRCAs of the clades themselves indicate a later spread. For example, it's easier to square the P312 TMRCA calculation with Unetice Culture than early Beaker Culture. I'm still not sure that the error bars are small enough to be confident, though, so I thought I'd toss this up for interpretation.
 
I can direct you to this very lively discussion from about 2 months ago on Dienekes page...it is between him and Anatole Klyosov and a Mr. Lohizun (an academcian)...among a few other knowledgeable peoples on the Y-Hap R1b issue. Hopefully this will bring more clarity to why I take this position as well as some others.

http://dienekes.blogspot.com/2011/08/y-str-variance-of-busby-et-al-2011.html

As for the Bell Beaker folk...I was going to respond before Sparkey, but had to do something (I am at work...shhh)

He said what I was going to say, just about...I was going to mention the Iberian-speaking (Vasco-phonic?) peoples...
 
I can direct you to this very lively discussion from about 2 months ago on Dienekes page...it is between him and Anatole Klyosov and a Mr. Lohizun (an academcian)...among a few other knowledgeable peoples on the Y-Hap R1b issue. Hopefully this will bring more clarity to why I take this position as well as some others.

http://dienekes.blogspot.com/2011/08/y-str-variance-of-busby-et-al-2011.html

Oh, no, you're going to make me defend Anatole Klyosov. :sick:

To be perfectly clear, Klyosov has exactly the opposite problem as Dienekes... he puts too much confidence into calculations, and decides on theories based on calculations without regard to much else. So let me criticize him before I defend him, with:

Anatole Klyosov said:
In short, R1b1 arose ~16,000 ybp in Central Asia, they spoke a non-IE language (Erbin), they made their way to Europe and arrived ~4800 ybp by several routes. They brought their non-IE language to Europe that time, and only 500 BC a first IE language was found in Celts/Kelts, who - arguably - can be referred to as R1b1. Still, it is not clear, they might have been R1a1. After it, and along with disappearance of the Etruscans, Europe started switching to IE languages back again. "Back" - because IE languages are much older, and probably they were in Europe, or even dominating in Europe some 10-6 thousand years before present.

This discounts so many possibilities that are more likely that it's rubbish. OK, the dates are ballpark correct, and Klyosov is pretty good with dates usually. But this is very ignorant of linguistics, history, and even other haplogroups. I can find more than five things wrong: (1) R1b's initial route language is uncertain; (2) Celtic languages diversified before 500BC; (3) Celtic peoples were not exclusively R1; (4) Etruscan and Basque-type pre-IE are unplaced linguistically and there is no reason to assume that IE is more ancient in those areas; (5) IE migration prior to Corded Ware is unaccounted-for...

So now that we're clear that I'm not a Klyosov fan, let me quote some things he gets right...

Anatole Klyosov said:
Here are a few rules of DNA genealogy:

(1) Separate a haplotype dataset into DNA-lineages. Typically, there is a mix of them in almost any dataset. In those cases a "common ancestor" is a phantom.

(2) Employ the mutation rate constant which is calibrated and which is different for ANY haplotype format. There are more than 30 haplotype formats in current use. Hence, there are more than 30 mutation rate constants which should be in use.

(3) Employ well-defined criteria to prove that every separate DNA-lineage in a dataset has one and only one common ancestor. There are several criteria...

Anatole Klyosov said:
First, they took totally wrong mutation rates - from Ballantyne et al (2010) [from father-son transmission]. The data are based on just a few mutations, and awfully unreliable.

Anatole Klyosov said:
Enough? Not quite. Just a minor thing in this context. When one works with "fast" markers, a correction for back mutations is MUCH higher. Otherwise one obtains a systematic deviation in TRCAs from slow to fast markers.

After that it goes downhill somewhat, and I'm not prepared to defend Klyosov's exact method. IIRC Nordtvedt has criticized it for claiming smaller error bars than it should.
 
Where do you think P312/U106 came from before Unetice?
 
Where do you think P312/U106 came from before Unetice?

Their ancestral migrations were from Eastern Europe, and shortly before that, the Near East. I'm not familiar with the archaeological cultures, but looking at the small L51+ L11- group here gives some idea of the geographic spread (membership in Switzerland, Italy, Poland, Hungary, Croatia, Turkey...), and the L23+ L51- group a slightly more distant idea (membership in Bulgaria, Croatia, Greece, Armenia, Iraq, Iran, Lebanon... notably high in ethnic Assyrians).

Maciamo has spent a lot more time on this than I have. You can see his summary here.
 
Thanks, Sparkey! I gave it a good reading. I guess the million-dollar question is, what language is R1b speaking when it (and fellow travelers) enters Europe?
 
These new dates sound a bit too young for me. They match almost exactly the expansion time of R1b-L11 to Western Europe (starting 4500 ybp), which seems to confirm their link with the Proto-Italo-Celto-Germanic speakers. It's just a little bit har to believe that U106 and P312 didn't already exist back in the steppes, before the IE conquest of Europe. It's unlikely that a single U106 man and a single P312 man living 4500 years ago were the direct patrilineal ancestors of over 60% of modern Western Europeans. I think that the hordes of Indo-Europeans who invaded Europe in the Bronze Age were rather composed of men who for the biggest part already belonged to L11*, P312* and U106*, and probably a few deeper subclades like L21 and U152. That's why I would rather expect L11 to be at least 5500-6000 years old, while P312 and U106 should be at least 5000-5500 years old.
 
These new dates sound a bit too young for me. They match almost exactly the expansion time of R1b-L11 to Western Europe (starting 4500 ybp), which seems to confirm their link with the Proto-Italo-Celto-Germanic speakers. It's just a little bit har to believe that U106 and P312 didn't already exist back in the steppes, before the IE conquest of Europe. It's unlikely that a single U106 man and a single P312 man living 4500 years ago were the direct patrilineal ancestors of over 60% of modern Western Europeans. I think that the hordes of Indo-Europeans who invaded Europe in the Bronze Age were rather composed of men who for the biggest part already belonged to L11*, P312* and U106*, and probably a few deeper subclades like L21 and U152. That's why I would rather expect L11 to be at least 5500-6000 years old, while P312 and U106 should be at least 5000-5500 years old.

I'm not so sure. Wouldn't we expect more archaic forms of P312 and U106 in the East if that's truly where they formed? I'm not that familiar with placing the centers of diversity of those, but certainly L21 and U152 are centered in Europe.
 
I'm not so sure. Wouldn't we expect more archaic forms of P312 and U106 in the East if that's truly where they formed? I'm not that familiar with placing the centres of diversity of those, but certainly L21 and U152 are centred in Europe.

I meant that if the mutations P312 or U106 appeared 5000-5500 years, it would have taken a few centuries before these lineages really takes off and started making a few percent of the PIE population. If the PIE population 5500 years ago was of 100,000 men (no idea, just an example), how long would you expect it would take before a new mutation, found in only one man, reaches 1% of the male population (1000 people) ?

Let's take the two theoretical examples to illustrate:

1) in a perfectly stable population where couples have two kids in average, one boy and one girl who lived long enough to procreate. Even if that man had twice more children than the average (2 boys instead of 1) and his sons, grandsons, great-grandsons, etc. always kept that rate of twice more sons than the PIE society's average, it would take this Y-DNA lineage 10 generations (about 250 years +- 50 years) to reach 1% of the population. Of course, after that, if they keep that pace the lineage can quickly become dominant. It only take 5 more generations to reach 33% of the population, and if unstopped, the lineage would reach 100% two generations later, i.e. at the 17th generation after the original ancestor. Things could go even faster if a chieftain/king's lineage with many wives started procreating 4, 5, 6... times more than the rest of the male population. The main drawback of this scenario is that a lineage of thousands or tens of thousands of people cannot have enough privilege over the rest of the population to keep having more children than average. This is usually limited only to the chieftain/king and some princes or high nobles, making at best 0.1% of the population.

2) in the same stable population, let's imagine that only the king's lineage produces more offspring than the rest of the population. This time, let's give the king a harem and let him have 10 times more sons than the average (10 instead of 1). However, only one of his sons will have the same procreation rate, while all other sons will have the average number of sons (1). The lineage therefore expands from 1 to 10, then from 10 to 19, from 19 to 28, from 28 to 37, etc. This is closer to the reality, but it also takes longer as only one man at a time has super-privilege. After 10 generations, there are only 100 men of the original lineage, or 0.1% of the population. The progression is fast but not exponential. It would take over 100 generations (2500 years +- 500 years) to reach 1% of the population at this rhythm.

However, whichever the scenario, things rarely, if ever, turn out that way for a single lineage. Diseases don't discriminate between royal/noble lineages and others. Wars always pruned the male population in ancient times. If the ancient Celts are any indication of the way of life of their PIE ancestors, the sons of chieftains were more likely to die at war as the warrior class was essentially the nobility, not the peasants (this was also true in India in Republican Rome, or in Medieval Europe). Furthermore, chieftains/kings get killed or deposed by other men, and the new dominant lineage undoes the progress of the first one.

The question is, when did R1b lineages passed the no-return point by becoming so overwhelmingly dominant that they always made up more than 50% of the male population in Western Europe ? Did it happen very quickly at the beginning of the Bronze Age, or was it a long process with plenty of ups and downs with other haplogroups ? We also cannot rule out that modern R1b lineages only started exceeding 50% of the population during the Iron Age. The Iron Age revolutionised warfare and made it possible to create huge armies that would overrun and destroy any neighbour stuck with sparse and expensive bronze weapons. This surely could have been a good time for the expansion of U152/S28, but also for J2 lineages in Italy and the Near East.


For all we know, P312 and U106 could have existed as a stagnating tiny minority of the PIE population in the Pontic Steppes for over a thousand years before the lineage takes off. If there were only a few men, or even a few dozen men carrying those mutations, and side branches got pruned regularly by wars and diseases, the STR variance would not have evolved much. There is no way to know that based on modern STR values. That's why I always prefer to overestimate a bit the age of a haplogroup rather than underestimate it.

I think that the STR method at best gives an approximation of the expansion time, after the real take off that makes a lineage soar exponentially for enough generations within a short time frame. Whether the mutation appeared 300 or 3000 years before that take off, there is no way to know without ancient DNA tests.
 
Last edited:
I meant that if the mutations P312 or U106 appeared 5000-5500 years, it would have taken a few centuries before these lineages really kicked off and started making a few percent of the PIE population.

Oh OK, I suppose you're talking about clade ages then, which is obviously going to be older than TMRCAs of the extant population currently sampled. L11 is certainly older than the P312/U106 Nested Age, for example. And it's very possible that the correct TMRCA of these is at the high end of the error bar, or even outside of the error bar, I won't discount that at all.

The rest I agree with as a whole, good post.
 
Oh OK, I suppose you're talking about clade ages then, which is obviously going to be older than TMRCAs of the extant population currently sampled. L11 is certainly older than the P312/U106 Nested Age, for example. And it's very possible that the correct TMRCA of these is at the high end of the error bar, or even outside of the error bar, I won't discount that at all.

The rest I agree with as a whole, good post.

No, I am talking about the TMRCA. It should be roughly 5000-5500 ybp present for P312 and U106, although I am sure the correct TMRCA cannot be calculated accurately from STR for various reasons (limited number of samples, bias towards Western European samples, methodology doesn't take into account historical variations in population size, etc.). I tried to explain that the age of each of these subclades could be from a few hundreds to a few thousands older than that (so possibly 8000 or 9000 years old if the lineages stagnated a long time before taking off).
 
No, I am talking about the TMRCA.

But then it's not really when it "appeared," that's where you confused me.

It should be roughly 5000-5500 ybp present for P312 and U106, although I am sure the correct TMRCA cannot be calculated accurately from STR for various reasons (limited number of samples, bias towards Western European samples, methodology doesn't take into account historical variations in population size, etc.).

Nordtvedt's methodology relies heavily on correct calculation of the modal haplotype, and a miscalculation could cause a systematic error. And I suppose you could argue that the "bias towards Western European samples" and the failure to "take into account historical variations in population size" create an incorrect modal calculation, but I don't see it. R1b, contrary to what you say, has a HUGE sample size. It wouldn't throw off the interclade TMRCA much anyway. I'm sorry, but I still fail to see where the systematic error lies.
 
R1b, contrary to what you say, has a HUGE sample size. It wouldn't throw off the interclade TMRCA much anyway. I'm sorry, but I still fail to see where the systematic error lies.

Depends what you can huge. 50,000 samples would still be only 0.01% of the European population, even less if we include other Westerners of European descent + Asian R1b. It's easy to miss minor lineages isolated for a long time from the mainstream with less than 0.01% of the population sampled. What if some early U152 moved to some mountainous parts of the Balkans or Anatolia that have never been tested yet ? There could be less than 100 people left somewhere very remote from an early side lineage of the kind. Of course it all depends whether your definition of TMRCA is based on the most common recent ancestor for the group people tested only, or else for all the people alive today belonging to that subclade. I think that you are using the first definition and me the second.
 

This thread has been viewed 27737 times.

Back
Top