More precise R1b subclade estimates using Nordtvedt's methodology

Depends what you can huge. 50,000 samples would still be only 0.01% of the European population, even less if we include other Westerners of European descent + Asian R1b. It's easy to miss minor lineages isolated for a long time from the mainstream with less than 0.01% of the population sampled.

I would say that 50,000 is a huge sample size, yes, although I agree that it could be biased. A biased sample in these calculations could knock off the modal, systematically bringing the TMRCA estimate down, or, if the modal is unbiased but the variance calculation is biased (more likely to be happening here IMHO), it could systematically make the error bars too small.

Of course it all depends whether your definition of TMRCA is based on the most common recent ancestor for the group people tested only, or else for all the people alive today belonging to that subclade. I think that you are using the first definition and me the second.

OK, we were sort of talking past each other rather than having a real disagreement then... To be clear, these estimates are for the first definition of TMRCA.
 
STR diversity is still instructive, even if only for positioning haplogroups in relation to each other. There is no doubt that mutations accumulate over time (number of generations.)

Busby's paper is used in some quarters to claim all STR information is bad or to avoided (boycotted.) Why throw the baby out with the bath water?

Even Busby thinks STR diversity is very important and uses it for his core counter-argument to Barlaresque. Here it is in the conclusion section.
Alternatively, if R-S127 originated prior to the Neolithic wave of expansion, then either it was already present in most of Europe before the expansion, or the mutation occurred in the east, and was spread before or after the expansion, in which case we would expect higher diversity in the east closer to the origins of agriculture, which is not what we observe.

I also find a very strange anomaly in Busby's logic. In their critical calculations, they used ten STRs across all of the R1b-S127 (L11) data to analyze STR diversity by geography. I think ten is too limiting. Ken Nordtvedt argues that each STR is its own experiment so increasing the number of STRs improves accuracy. As a former member of the U.S. National Science Board, I don't think you'll find many who can hold a candle to his grasp statistics. That's not the anomaly anyway.

Busby et al. say they selected the ten STRs based on their ability to correlate with time. Busby says this "θ(R)/2μ is an estimate of the duration of linearity". This is in "Table 1-STR theta estimates." If you cross-check duration of linearity against the ten STRs they used you'll find that six of the ten have durations of less than 5000 years. The large initial European Neolithic advances occurred roughly 7000 years ago so by their own definition their conclusions on STR diversity across Europe are not valid for arguing against Balaresque's Neolithic R1b hypothesis!
Look for yourself. No one has been able to explain this strange anomaly in Busby's paper.
 
Last edited:
.... It wouldn't throw off the interclade TMRCA much anyway....
This is correct. This is the value of interclade TMRCA calculations. We are comparing two larger branches so missing some of the twigs on each individual branch is negated. In Generations6, Nordtvedt also used "nested variance" in the comparison of the two branches. This segregates the variance between the two and helps improve precision.

Still, all these estimates are only approximations.
 
... For all we know, P312 and U106 could have existed as a stagnating tiny minority of the PIE population in the Pontic Steppes for over a thousand years before the lineage takes off. If there were only a few men, or even a few dozen men carrying those mutations, and side branches got pruned regularly by wars and diseases, the STR variance would not have evolved much. There is no way to know that based on modern STR values.....
Here is another way to look at things rather than just looking at TMRCA estimates. Modal haplotypes are not the ancestral, but they provide us information on what they might have been.
From DNA projects, we now have over 1000 111 STR length haplotypes of R-M269 deep clade tested people. Ysearch only holds 96 STRs, so I've loaded it with the following 96 marker modals:

2YYB6 R1b-L11* (S127*) Paragroup
XQJ7H R1b-P312 (S116) and all Subclades (this includes U152, L21 and Z196)
QM4ES R1b-U152 (S28) and all Subclades
K9VGV R1b-L21 (S145) and all Subclades
PEMD5 R1b-Z196 and all Subclades
N5PA5 R1b-U106 (S21) and all Subclades

If you are a genetic genealogist researching a set of families with the surname variants Richards, Ricardo, Rikkert and Rhisiart and you found these GD's at 96 markers, what would you think?

L11(S127)* to P312(S116) __ 3
L11(S127)* to U152(S28) ___ 2
L11(S127)* to L21(S145) ___ 6
L11(S127)* to Z196 ________ 6
L11(S127)* to U106(S21) ___ 5

P312(S116) to U152(S28) ___ 1
P312(S116) to L21(S145) ___ 3
P312(S116) to Z196 ________ 4
P312(S116) to U106(S21) ___ 5

L21(S145) to U152(S28) ____ 4
L21(S145) to Z196 _________ 6

U152(S28) to Z196 _________ 6


I don't know where this family started but they were closely related and apparently very assertive in expanding. Maybe that would explain the breadth of their proliferation. They were expanding over new ground quickly (with some advantage) rather than all staying at home and in-fighting.
 
Last edited:
Thanks for the informative responses, Mike. I appreciate your input on Busby in particular... that's some good analysis, including the criticism at the end, I haven't read about that before.

STR diversity is still instructive, even if only for positioning haplogroups in relation to each other. There is no doubt that mutations accumulate over time (number of generations.)

What do you think of geographic STR diversity analysis? That's not really what's going on here, but Dienekes has saved some of his sharper criticism for that in particular. Personally I think the criticism is rubbish for the reasons I outlined earlier, although I acknowledge that different circumstances can produce similar patterns unexpectedly, so our confidence in such analyses should be checked.

This is correct. This is the value of interclade TMRCA calculations. We are comparing two larger branches so missing some of the twigs on each individual branch is negated. In Generations6, Nordtvedt also used "nested variance" in the comparison of the two branches. This segregates the variance between the two and helps improve precision.

Right, I think this needs to be highlighted, and I probably didn't do that enough in my initial post. Maciamo is probably right about sample bias... but it doesn't matter in the interclade TMRCA calculation. The nested variance calculation can have its error bars narrowed more than it should with a biased sample though, right? Am I understanding that correctly?

Here is another way to look at things rather than just looking at TMRCA estimates. Modal haplotypes are not the ancestral, but they provide us information on what they might have been.

It's interesting that Nordtvedt uses modal as a byword for founding, though, or at least he has in the past. It's easy to imagine that the modal is a good approximation of the founder in most cases, at least if the calculation is done right and considers the fact that a descendant tree can be poorly balanced. But it's also possible to imagine modal calculations gone wrong. (A horror film for population geneticists?)

How high do you see the possibility for an error in the modal calculations here? I see little...
 
....
It's interesting that Nordtvedt uses modal as a byword for founding, though, or at least he has in the past. It's easy to imagine that the modal is a good approximation of the founder in most cases, at least if the calculation is done right and considers the fact that a descendant tree can be poorly balanced. But it's also possible to imagine modal calculations gone wrong. (A horror film for population geneticists?)...
Nordtvedt's Generations6-1 uses modal calculations in some of the output but not, according to Ken, in the "nested variance" calculations which is the "new" part of his method. I'm not statistician enough to argue all of this. I would refer you to Ken's web site where he describes the formulas or to Rootsweb where you directly ask him yourself.
 
....What do you think of geographic STR diversity analysis? That's not really what's going on here, but Dienekes has saved some of his sharper criticism for that in particular. ...
The whole idea of boycotting STR diversity analysis is going off the deep end, but it is important to keep it in perspective. It only provides approximations.

The topic of geographic STR diversity is fraught with gray areas. When we look at STR diversity within subclades we are evaluating within known clades with certain most recent common ancestors. When you start looking at STR diversity across geographies you can't be sure, that you are looking at people that are closer related within one geography versus another. Antole Klyosov calls this issue "phantom" common ancestors.

This does not mean I think all STR diversity analysis across geographies is worthless, just that you have to try to determine which geographies are origins or launch points versus crossroads or pooling points. STR diversity by geography is just another piece of data to use and cross-reference to archaeology, history, linquistic theory, etc.

... It is important to narrow the gray areas by breaking the haplotypes into the deepest subclades possible. On the other hand, subclade (haplogroup) diversity is another indicator, not just STR diversity although the latter may infer the former.
 
This does not mean I think all STR diversity analysis across geographies is worthless, just that you have to try to determine which geographies are origins or launch points versus crossroads or pooling points. STR diversity by geography is just another piece of data to use and cross-reference to archaeology, history, linquistic theory, etc.

Thanks. This is exactly what I've been getting at, put more concisely. I would also add that the frequency of "pooling points" has almost certainly increased since the beginning of the Modern Age.
 
To make Dienekes' stance clearer...I am forced to somewhat defend him here...hahaha

Dienekes in his own words:
"...my opinion of Y-STRs as a tool for inferring past population movements is, to put it mildly, low. When Bahamian Y-STR variance is higher than African one, and E-V13, one of the youngest European Y-haplogroups (in terms of Y-STR variance) turns up in Spain in one of the earliest ancient DNA samples, it goes without saying that the burden of proof is on those who wish to continue to talk about Neolithic or other population movements to make the assumptions of their models clearer. Nonetheless, there is still some utility in Y-STRs..."

Furthermore, he quotes the paper that I posted as a topic here...Herrera et al. 2011:

"From the paper:
However, owing to the contentions associated with the current calibrations of the Y-STR mutation rates,32,34,35,41 as well as the limitations of the assumptions utilized by the methodologies for time estimations, the absolute dates generated in this study should only be taken as rough estimates of upper bounds.

Indeed. We are at the point where Y-STRs are at the end of their utility, but the replacement technology of extensive Y-chromosome sequencing has not quite arrived in an economical way yet."

One last thing from "his own mouth":

"And, the story has other complications. From the current paper:

[The relative expansion times for haplogroup J2-M172 (Table 4) generally correspond with those yielded for R1b-M343, with the exception of Greece and Crete, which, unlike haplogroup R1b-M343, are slightly older than the dates yielded for several of the Near Eastern groups as well as the four Armenian populations.]



As mentioned above, I don't give much weight on Y-STR evidence, but observations such as the above certainly add to the feeling of unease that something is not quite right with the default picture of prehistory."

So as we can see, Dienekes does not think there is no utility for Y-STRs, just a very limited utility.Even the authors of the paper he was discussing acknowledge the uncertainty of Y-STRs.
 
So as we can see, Dienekes does not think there is no utility for Y-STRs, just a very limited utility.Even the authors of the paper he was discussing acknowledge the uncertainty of Y-STRs.

Let me be clear what I'm disagreeing with then... I disagree with these:

Dienekes said:
Y-STRs are effectively dead for age estimation
Dienekes said:
We are at the point where Y-STRs are at the end of their utility

...as well as some of the criticisms he presents to geographic diversity analysis. I mean, he says that "there is still some utility in Y-STRs" but then rejects any substantive analysis using them.

Let me address some of his points, briefly:

Dienekes said:
When Bahamian Y-STR variance is higher than African one...

This is a dumb criticism of Y-STR geographic diversity analysis, as I've said already. We expect greater diversity at, as Mike puts it, a "pooling point" like the Bahamas than in an Old World population with all the genetic bottlenecks and founder effects of the past.

Dienekes said:
...and E-V13, one of the youngest European Y-haplogroups (in terms of Y-STR variance) turns up in Spain in one of the earliest ancient DNA samples...

Dienekes doesn't really understand what is known about E1b-V13 if he's calling it one "one of the youngest European Y-haplogroups." It has certain expansions which are quite young, yes, and most European E1b folk are descended from very recent E1b founders, including a very young Southeastern European founder, but it isn't really a young clade. An interesting commentary is Steve Bird on King 2011.

Dienekes said:
...it goes without saying that the burden of proof is on those who wish to continue to talk about Neolithic or other population movements to make the assumptions of their models clearer.

I can agree with him on this, though.
 
So as we can see, Dienekes does not think there is no utility for Y-STRs, just a very limited utility. Even the authors of the paper he was discussing acknowledge the uncertainty of Y-STRs.
So why does he say "STRs $%Sck" ? Why does he say he is boycotting them? That's what I mean by going off the deep-end which leaves people with the wrong impression.

And by the way, if he thinks the utility for Y-STR diversity is so limited what does he propose instead? No one is saying to use STR diversity in isolation. It is just a another tool. Why throw it out? We know for sure that frequency can be very misleading in terms of origin. I know of place where I think R-L21 was very, very high, perhaps higher than Ireland. It's O'Neil, Nebraska. Does that mean L21 originated there?
Of course not, but this is the same as the argument he uses about the Bahamas, etc. This kind of count-argument is an "overwhelming exception" logical fallacy. http://en.wikipedia.org/wiki/Overwhelming_exception

We have to use these tools together. Why throw out a vice or anvil? Alone, they may not solve many problems but with a hammer and fire one can forge metal.

The whole "burden of proof" argument is another logical fallacy. It's unreasonable to expect that we can prove of much of anything (other than expansive generalities) about these things that happened 4-10K years ago. We are all smart enough we can handle ambiguity. We are looking for most likely alternatives and looking to essentially eliminate alternatives on the way.
 
I agree with Maciamo,
The arrival of a gene in a given region is NOT the date of birth of that gene. Given the high % of the R1-b subclasses in today's population of Europe, I would think that they were present in more than one individuals by 2500BC.

For me the most likely scenario is still the appearance of R1b right before the ice age maximum around 25,000 BP, and the appearance of the subclasses right after the end of the ice-age around 10,000 BP, as the R1b tribes lived in the caves of bashkortostan;
 
The arrival of a gene in a given region is NOT the date of birth of that gene.

Obviously, the first is always after the second.

Given the high % of the R1-b subclasses in today's population of Europe, I would think that they were present in more than one individuals by 2500BC.

OK, but you'll need some evidence, like a serious challenge to these TMRCA estimates or ancient DNA to prove it. "I would think" doesn't help much. Right now, 2500BC is outside the error bars for both P312 and U106, meaning that if these calculations are right, there were zero P312 or U106 carriers in 2500BC.

For me the most likely scenario is still the appearance of R1b right before the ice age maximum around 25,000 BP, and the appearance of the subclasses right after the end of the ice-age around 10,000 BP, as the R1b tribes lived in the caves of bashkortostan;

Which subclades specifically do you think came out of Bashkortostan after the Ice Age? What evidence do you have for it?
 
I agree with Maciamo, The arrival of a gene in a given region is NOT the date of birth of that gene. Given the high % of the R1-b subclasses in today's population of Europe, I would think that they were present in more than one individuals by 2500BC. ...

I just wanted to clear up a couple of things on the estimations that Nordtvedt's methodology produces. The most important this is that you should go to his web site and read through his charts documenting his formulas. http://knordtvedt.home.bresnan.net/

The following quotes are from a string emails between Ken and myself. They were person to person but I don't think he'd mind me quoting him on this because they are just clarifications of his published method. Do go to his web site for to understand everything in context.

1. The modals are NOT the basis fo the interclade TMRCAs.
Ken Nordtvedt said:
The modal for each clade is only for auxiliary purposes. It plays no role whatsoever in estimating the interclade node ages or the clade coalescence ages. It’s use is only for two purposes; to evaluate some sigmas and to estimate (intra)clade tmrcas which I do not consider as good as the interclade node estimates.
I inserted the "(intra)" because that is what I interpret his intent to be.

2. The interclade TMRCA estimate IS for the specific "node" man that is the Most Recent Common Ancestor both of the two clades (P312 & U106 in this case.) It is NOT a coalescence age. It is estimating that one father-son event.
Ken Nordtvedt said:
the interclade node age estimate is for a specific event in history. Age of the father of the two sons, each of whose descendant line leads to one clade or the other.

3. His output includes coalescence ages but they are clearly labeled as so. I interpret these ages are more akin to times of signficant expansion.

Ken Nordtvedt said:
Coalescence Age for a clade is a different thing. It does not estimate time of a specific event. It is an abstract age and in words is the average tmrca of all the pairs you can form from the clade haplotype sample collection in use.

4. Don't focus too much on the single most probable age. That undoubtedly is NOT the precise date of the MRCA. It is the range that counts.

In the case example for this thread, what "U106 & P312 Nested Age___4.5 __ (5.2-3.8)" provides the range of 5.2K to 3.8K years ago. That is the one sigma range so basically Ken's methodology is saying there is a 68% chance that actual MCRA date will fall in that range. That's all it is saying. 4.5K is just the most likely part of the whole range. Most people take a range like this and use the high end, but the truth is the odds are as good it could be younger as well as older.

I know some don't want to believe young ages like these, but this just what the numbers show (and we do have a lot of numbers [long ht's] now.) The real argument is over the mutation rates. I don't see why we wouldn't use the germ-line rates that we use in genealogical calculations since Ken throws out the multi-copy STRs anyway... but this is whole area is debatable.
 
Last edited:

This thread has been viewed 27926 times.

Back
Top