PDA

View Full Version : More precise R1b subclade estimates using Nordtvedt's methodology



sparkey
21-11-11, 19:46
We all know that Ken Nordtvedt has been refining his STR dating estimates to get more precise estimates of Haplogroup I subclades. His latest, Generations6, has narrowed the error bars even more. But what happens when we apply the exact same methodology to get estimates for the R1b subclades? Mike Walsh tries it (http://dna-forums.org/index.php?/topic/16851-tmrca-and-coalescence-age-estimates-for-r-m269-and-its-subclades/)...


U106 & P312 TMRCA Age____4.6 __ (5.9-3.2)
U106 & P312 Nested Age___4.5 __ (5.2-3.8)(Edit)

U106 TMRCA Age___________3.7 __ (4.0-3.4)
U106 Coalescence Age_____3.4 __ (3.7-3.1)

P312 TMRCA Age___________4.1 __ (4.4-3.8)
P312 Coalescence Age_____3.7 __ (4.0-3.5)

Notice the very small error range [5.2 to 3.8 ybp] for the U106 and P312's Most Recent Common Ancestor (MRCA) using Ken's "Nested Variance" method. Essentially this is a refinement of the non-nested age estimate directly above it. The best estimate for the MRCA of P312 and U106 is 4.5 ypb (2500 BC.) This MRCA would be an R-L11* person but this is NOT the TMRCA for R-L11. Essentially it puts a lower limit on L11's TMRCA, or at least the 5.2 to 3.8 ybp range does. Please read and understand Ken's method. http://knordtvedt.ho...20Variances.pdf (http://knordtvedt.home.bresnan.net/Nested%20Variances.pdf)

So, what are the implications, Eupedia? Barring a systematic error, this seems to confirm some of our suspicions.

Knovas
21-11-11, 20:17
Basically, that European R1b is fairly recent. Good sparkey ;)

A. Tamar Chabadi
22-11-11, 00:10
I am not so sure that it is all so recent...

Increased Resolution Within Y-Chromosome Haplogroup R1b M269 Sheds Light On The Neolithic Transition In Europe

George Busby et al.

Early studies on classical polymorphisms have largely been vindicated by the growing tome of information on the genetic structure of European populations, with mtDNA, Y-Chromosome and autosomal markers all combining to give a fundamental pattern of migration from the East. The processes behind this pattern are however, less clear, particularly with regard to uniparental markers. Much debate still rages about how best to use Y and mtDNA to date particular historical movements, or indeed if it is appropriate at all. For example, whilst some progress has been made recently in calibrating the mtDNA clock, the selection of a mutation rate with which to date the Y-Chromosome is contentious, as the two most favoured values can give dates that differ by a factor of three. In order to address this we have investigated the sub-lineages of the common European haplogroup R1b-M269. This haplogroup has been shown to be clinal in Europe, and more recently has been posited to be the result of the Neolithic expansion from the Near East. Here, we use newly characterised SNPs downstream of M269 to produce a refined picture of the haplogroup in Europe, and further show that the diversity of this lineage cannot be entirely attributed to Neolithic migration out of Anatolia. We use simple coalescent simulations to estimate an absolute lower bound for the age of the sub-haplogoups. Rather than originating with the farmers from the East, we suggest that the sub-structure of R1b-M269 visible in Europe today, and thus the great majority of European paternal ancestry, is the result of the interaction between the Neolithic wave of expansion and populations of early Europeans already present in the path of the wave.

Also, I am with Dienekes on boycotting Y-STRs...they are a quagmire.

sparkey
22-11-11, 01:04
I am not so sure that it is all so recent...

I don't get why Busby is as useful here. Nordtvedt's methodology is more refined and more informed. In fact, it pushes the ball the opposite direction vs. what Busby did in the 2011 follow up. This is new, applying Nordtvedt's Generations6 to R1b, reducing the error bars by a few centuries.

When do you place the entrance of R1b-L11 into Europe?


Also, I am with Dienekes on boycotting Y-STRs...they are a quagmire.

OK, but why? Dienekes' criticism of STR dating has bordered on stupid. Take this:


I like the line about there being substantially more Y-STR variation in E1b1a7a-U174 and E1b1ba8-U175 in the Bahamas than any African collection. I have argued for years that the central assumption of phylogeography, that the location of highest Y-STR diversity is not necessarily the point of origin of a haplogroup, since Y-STR diversity can be affected both by antiquity and by admixture. Nonetheless, I keep reading papers where tiny differences in Y-STR variation, even if we forget about the noisiness of Y-STRs themselves, are taken as evidence of ancient migrations. Thankfully, the time when Y-STRs were used to infer ancient migrations is over, and the huge collection of Y-STR haplotypes amassed by population geneticists, forensic specialists, and genealogists alike can be put to uses for which it is more amenable.

...seriously Dienekes? As if we would ever expect an area colonized in the Modern Age to have the same STR diversity rules as ancient populations! There are a few assumptions that we make with STR diversity analyses that hold with ancient migrations, but not the Bahamas... like geographic isolation and smaller populations.

Hopefully you have a better argument against using STRs? I mean, in this case, it's even informed by SNPs, and correlates to ancient DNA.

Asturrulumbo
22-11-11, 01:17
Very, very interesting! If L11 is from around 2,500 BC, that would place it right at the time of the Beakers reaching the Lower Danube:
http://www.eupedia.com/forum/attachment.php?attachmentid=5366&stc=1
From there, they could have migrated north and brought what would later be U106 to Northern Europe. It is also worth noting that the "stelae people" hypothesis would also be hard to sustain if these estimates hold.

sparkey
22-11-11, 01:23
Nordtvedt's full response to Busby (http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2011-09/1315850283) (after it was brought up by Mike Walsh interestingly):


"Contrary to common belief, estimates of ASD, and therefore T, vary widely when different subsets of STRs are used with the same sample."

[[ They should be a bit more careful with their choice of adjectives. If you divide up the STRs into subsets, certainly you will get more variation in tmrca estimates; the statistical confidence interval blows up as you try to rely on fewer and fewer independent mutational experiments through the tree. Their key figure 4, I think, shows a trend of larger tmrcas when using subsets of slower and slower STRs. But 1) this just confirms what hobbyists found out in the last few years, and 2) their graph has absurdly tiny or misrepresented error bars associated with each point on the graph; so the trend for U106/P312 node is hardly more than the the realistic statistical error bars for the graph points. 3) This trend in tmrcas is believed to be associated with the fastest STRs "saturating" by having changing mutation rates for up and down mutations such that variance grows slower than linear when two steps of mutation start to build up in the final haplotypes. This trend therefore gets fractionally weaker and weaker as younger tmrcas are estimated. 4) It is erroneous in any event to include the faster STRs without downweighting, because their variance of variance begins to grow quadratically in age rather than linear in age sooner. 5) I do not use multi-copy STRs in variance estimates because of the difficulty (actually complexkity) of counting normal growth of variance for them, and also for the frequency of recLOHs.

For tmrcas of 4000 or 5000 years and younger, I eliminate the multi-copy markers and properly weight the fast versus slow markers (which downweights fast markers), but otherwise do not throw out STRs. I believe the statistical confidence intervals dominate in this era, and especially when we throw in our uncertainties and biases in the marker mutation rates, especially the slow markers. Fortunately biased mutation rates do not ruin relative tmrca estimates, only the confidence in their absolute sizes.

Key tmrca differences rather than absolute tmrcas are the new frontier in my view. For example: There is the age of the L11 MRCA. There is the MRCA node age ancestral to both U106 and P312. There are the coalescence ages for U106 and P312. While each of these ages are in the 4000 or greater year range, their differences could be quite small, perhaps a few centuries. By properly combining key variances these age differences can be estimated with much, much higher absolute precision (in generations) than can the separate tmrcas. Increasingly people are trying to find snps and sort out what is going on during this era of rapid population expansions and resulting bushiness of the tree with nodes and discovered snps closer and closer together in time. These new tools of evaluating key combinations of variances or GDs and converting those properties into time estimates will be the way to go.

But you really can't dramatically reduce statistical tmrca errors by just enlarging haplotype populations. Those errors go like 1 divided by square root of total STR mutation rate for young tmrcas, and for older tmrcas like 1 divided by square root of number of STRs used in haplotypes. So you try to do tmrcas with just a few STRs at your peril, regardless of haplotype counts. KN ]]

Farquharson
22-11-11, 07:08
Didn't the Beakers spread Indo-European to the West?

sparkey
22-11-11, 18:17
Didn't the Beakers spread Indo-European to the West?

That's far from settled, and in fact, it mismatches the apparent direction of Beaker Culture spread (West to East), at least if we go by pottery. One hypothesis that's gained some popularity on this forum is that the Beakers, although pre-IE, acted as a catalyst for the spread of IE from its easternmost extreme.

One thing I'm noticing here is that, although the U106 and P312 Nested Age seems to be narrowing on the Beaker time period somewhat, the TMRCAs of the clades themselves indicate a later spread. For example, it's easier to square the P312 TMRCA calculation with Unetice Culture than early Beaker Culture. I'm still not sure that the error bars are small enough to be confident, though, so I thought I'd toss this up for interpretation.

A. Tamar Chabadi
22-11-11, 18:49
I can direct you to this very lively discussion from about 2 months ago on Dienekes page...it is between him and Anatole Klyosov and a Mr. Lohizun (an academcian)...among a few other knowledgeable peoples on the Y-Hap R1b issue. Hopefully this will bring more clarity to why I take this position as well as some others.

http://dienekes.blogspot.com/2011/08/y-str-variance-of-busby-et-al-2011.html

As for the Bell Beaker folk...I was going to respond before Sparkey, but had to do something (I am at work...shhh)

He said what I was going to say, just about...I was going to mention the Iberian-speaking (Vasco-phonic?) peoples...

sparkey
22-11-11, 19:33
I can direct you to this very lively discussion from about 2 months ago on Dienekes page...it is between him and Anatole Klyosov and a Mr. Lohizun (an academcian)...among a few other knowledgeable peoples on the Y-Hap R1b issue. Hopefully this will bring more clarity to why I take this position as well as some others.

http://dienekes.blogspot.com/2011/08/y-str-variance-of-busby-et-al-2011.html

Oh, no, you're going to make me defend Anatole Klyosov. :sick:

To be perfectly clear, Klyosov has exactly the opposite problem as Dienekes... he puts too much confidence into calculations, and decides on theories based on calculations without regard to much else. So let me criticize him before I defend him, with:


In short, R1b1 arose ~16,000 ybp in Central Asia, they spoke a non-IE language (Erbin), they made their way to Europe and arrived ~4800 ybp by several routes. They brought their non-IE language to Europe that time, and only 500 BC a first IE language was found in Celts/Kelts, who - arguably - can be referred to as R1b1. Still, it is not clear, they might have been R1a1. After it, and along with disappearance of the Etruscans, Europe started switching to IE languages back again. "Back" - because IE languages are much older, and probably they were in Europe, or even dominating in Europe some 10-6 thousand years before present.

This discounts so many possibilities that are more likely that it's rubbish. OK, the dates are ballpark correct, and Klyosov is pretty good with dates usually. But this is very ignorant of linguistics, history, and even other haplogroups. I can find more than five things wrong: (1) R1b's initial route language is uncertain; (2) Celtic languages diversified before 500BC; (3) Celtic peoples were not exclusively R1; (4) Etruscan and Basque-type pre-IE are unplaced linguistically and there is no reason to assume that IE is more ancient in those areas; (5) IE migration prior to Corded Ware is unaccounted-for...

So now that we're clear that I'm not a Klyosov fan, let me quote some things he gets right...


Here are a few rules of DNA genealogy:

(1) Separate a haplotype dataset into DNA-lineages. Typically, there is a mix of them in almost any dataset. In those cases a "common ancestor" is a phantom.

(2) Employ the mutation rate constant which is calibrated and which is different for ANY haplotype format. There are more than 30 haplotype formats in current use. Hence, there are more than 30 mutation rate constants which should be in use.

(3) Employ well-defined criteria to prove that every separate DNA-lineage in a dataset has one and only one common ancestor. There are several criteria...


First, they took totally wrong mutation rates - from Ballantyne et al (2010) [from father-son transmission]. The data are based on just a few mutations, and awfully unreliable.


Enough? Not quite. Just a minor thing in this context. When one works with "fast" markers, a correction for back mutations is MUCH higher. Otherwise one obtains a systematic deviation in TRCAs from slow to fast markers.

After that it goes downhill somewhat, and I'm not prepared to defend Klyosov's exact method. IIRC Nordtvedt has criticized it for claiming smaller error bars than it should.

Farquharson
22-11-11, 21:39
Where do you think P312/U106 came from before Unetice?

sparkey
22-11-11, 21:59
Where do you think P312/U106 came from before Unetice?

Their ancestral migrations were from Eastern Europe, and shortly before that, the Near East. I'm not familiar with the archaeological cultures, but looking at the small L51+ L11- group here (http://www.familytreedna.com/public/ht35new/default.aspx?section=yresults) gives some idea of the geographic spread (membership in Switzerland, Italy, Poland, Hungary, Croatia, Turkey...), and the L23+ L51- group a slightly more distant idea (membership in Bulgaria, Croatia, Greece, Armenia, Iraq, Iran, Lebanon... notably high in ethnic Assyrians).

Maciamo has spent a lot more time on this than I have. You can see his summary here (http://www.eupedia.com/europe/Haplogroup_R1b_Y-DNA.shtml).

Farquharson
22-11-11, 23:51
Thanks, Sparkey! I gave it a good reading. I guess the million-dollar question is, what language is R1b speaking when it (and fellow travelers) enters Europe?

Maciamo
23-11-11, 12:52
These new dates sound a bit too young for me. They match almost exactly the expansion time of R1b-L11 to Western Europe (starting 4500 ybp), which seems to confirm their link with the Proto-Italo-Celto-Germanic speakers. It's just a little bit har to believe that U106 and P312 didn't already exist back in the steppes, before the IE conquest of Europe. It's unlikely that a single U106 man and a single P312 man living 4500 years ago were the direct patrilineal ancestors of over 60% of modern Western Europeans. I think that the hordes of Indo-Europeans who invaded Europe in the Bronze Age were rather composed of men who for the biggest part already belonged to L11*, P312* and U106*, and probably a few deeper subclades like L21 and U152. That's why I would rather expect L11 to be at least 5500-6000 years old, while P312 and U106 should be at least 5000-5500 years old.

sparkey
23-11-11, 22:08
These new dates sound a bit too young for me. They match almost exactly the expansion time of R1b-L11 to Western Europe (starting 4500 ybp), which seems to confirm their link with the Proto-Italo-Celto-Germanic speakers. It's just a little bit har to believe that U106 and P312 didn't already exist back in the steppes, before the IE conquest of Europe. It's unlikely that a single U106 man and a single P312 man living 4500 years ago were the direct patrilineal ancestors of over 60% of modern Western Europeans. I think that the hordes of Indo-Europeans who invaded Europe in the Bronze Age were rather composed of men who for the biggest part already belonged to L11*, P312* and U106*, and probably a few deeper subclades like L21 and U152. That's why I would rather expect L11 to be at least 5500-6000 years old, while P312 and U106 should be at least 5000-5500 years old.

I'm not so sure. Wouldn't we expect more archaic forms of P312 and U106 in the East if that's truly where they formed? I'm not that familiar with placing the centers of diversity of those, but certainly L21 and U152 are centered in Europe.

Maciamo
23-11-11, 23:02
I'm not so sure. Wouldn't we expect more archaic forms of P312 and U106 in the East if that's truly where they formed? I'm not that familiar with placing the centres of diversity of those, but certainly L21 and U152 are centred in Europe.

I meant that if the mutations P312 or U106 appeared 5000-5500 years, it would have taken a few centuries before these lineages really takes off and started making a few percent of the PIE population. If the PIE population 5500 years ago was of 100,000 men (no idea, just an example), how long would you expect it would take before a new mutation, found in only one man, reaches 1% of the male population (1000 people) ?

Let's take the two theoretical examples to illustrate:

1) in a perfectly stable population where couples have two kids in average, one boy and one girl who lived long enough to procreate. Even if that man had twice more children than the average (2 boys instead of 1) and his sons, grandsons, great-grandsons, etc. always kept that rate of twice more sons than the PIE society's average, it would take this Y-DNA lineage 10 generations (about 250 years +- 50 years) to reach 1% of the population. Of course, after that, if they keep that pace the lineage can quickly become dominant. It only take 5 more generations to reach 33% of the population, and if unstopped, the lineage would reach 100% two generations later, i.e. at the 17th generation after the original ancestor. Things could go even faster if a chieftain/king's lineage with many wives started procreating 4, 5, 6... times more than the rest of the male population. The main drawback of this scenario is that a lineage of thousands or tens of thousands of people cannot have enough privilege over the rest of the population to keep having more children than average. This is usually limited only to the chieftain/king and some princes or high nobles, making at best 0.1% of the population.

2) in the same stable population, let's imagine that only the king's lineage produces more offspring than the rest of the population. This time, let's give the king a harem and let him have 10 times more sons than the average (10 instead of 1). However, only one of his sons will have the same procreation rate, while all other sons will have the average number of sons (1). The lineage therefore expands from 1 to 10, then from 10 to 19, from 19 to 28, from 28 to 37, etc. This is closer to the reality, but it also takes longer as only one man at a time has super-privilege. After 10 generations, there are only 100 men of the original lineage, or 0.1% of the population. The progression is fast but not exponential. It would take over 100 generations (2500 years +- 500 years) to reach 1% of the population at this rhythm.

However, whichever the scenario, things rarely, if ever, turn out that way for a single lineage. Diseases don't discriminate between royal/noble lineages and others. Wars always pruned the male population in ancient times. If the ancient Celts are any indication of the way of life of their PIE ancestors, the sons of chieftains were more likely to die at war as the warrior class was essentially the nobility, not the peasants (this was also true in India in Republican Rome, or in Medieval Europe). Furthermore, chieftains/kings get killed or deposed by other men, and the new dominant lineage undoes the progress of the first one.

The question is, when did R1b lineages passed the no-return point by becoming so overwhelmingly dominant that they always made up more than 50% of the male population in Western Europe ? Did it happen very quickly at the beginning of the Bronze Age, or was it a long process with plenty of ups and downs with other haplogroups ? We also cannot rule out that modern R1b lineages only started exceeding 50% of the population during the Iron Age. The Iron Age revolutionised warfare and made it possible to create huge armies that would overrun and destroy any neighbour stuck with sparse and expensive bronze weapons. This surely could have been a good time for the expansion of U152/S28, but also for J2 lineages in Italy and the Near East.


For all we know, P312 and U106 could have existed as a stagnating tiny minority of the PIE population in the Pontic Steppes for over a thousand years before the lineage takes off. If there were only a few men, or even a few dozen men carrying those mutations, and side branches got pruned regularly by wars and diseases, the STR variance would not have evolved much. There is no way to know that based on modern STR values. That's why I always prefer to overestimate a bit the age of a haplogroup rather than underestimate it.

I think that the STR method at best gives an approximation of the expansion time, after the real take off that makes a lineage soar exponentially for enough generations within a short time frame. Whether the mutation appeared 300 or 3000 years before that take off, there is no way to know without ancient DNA tests.

sparkey
24-11-11, 00:23
I meant that if the mutations P312 or U106 appeared 5000-5500 years, it would have taken a few centuries before these lineages really kicked off and started making a few percent of the PIE population.

Oh OK, I suppose you're talking about clade ages then, which is obviously going to be older than TMRCAs of the extant population currently sampled. L11 is certainly older than the P312/U106 Nested Age, for example. And it's very possible that the correct TMRCA of these is at the high end of the error bar, or even outside of the error bar, I won't discount that at all.

The rest I agree with as a whole, good post.

Maciamo
24-11-11, 18:34
Oh OK, I suppose you're talking about clade ages then, which is obviously going to be older than TMRCAs of the extant population currently sampled. L11 is certainly older than the P312/U106 Nested Age, for example. And it's very possible that the correct TMRCA of these is at the high end of the error bar, or even outside of the error bar, I won't discount that at all.

The rest I agree with as a whole, good post.

No, I am talking about the TMRCA. It should be roughly 5000-5500 ybp present for P312 and U106, although I am sure the correct TMRCA cannot be calculated accurately from STR for various reasons (limited number of samples, bias towards Western European samples, methodology doesn't take into account historical variations in population size, etc.). I tried to explain that the age of each of these subclades could be from a few hundreds to a few thousands older than that (so possibly 8000 or 9000 years old if the lineages stagnated a long time before taking off).

sparkey
25-11-11, 08:48
No, I am talking about the TMRCA.

But then it's not really when it "appeared," that's where you confused me.


It should be roughly 5000-5500 ybp present for P312 and U106, although I am sure the correct TMRCA cannot be calculated accurately from STR for various reasons (limited number of samples, bias towards Western European samples, methodology doesn't take into account historical variations in population size, etc.).

Nordtvedt's methodology relies heavily on correct calculation of the modal haplotype, and a miscalculation could cause a systematic error. And I suppose you could argue that the "bias towards Western European samples" and the failure to "take into account historical variations in population size" create an incorrect modal calculation, but I don't see it. R1b, contrary to what you say, has a HUGE sample size. It wouldn't throw off the interclade TMRCA much anyway. I'm sorry, but I still fail to see where the systematic error lies.

Maciamo
25-11-11, 11:52
R1b, contrary to what you say, has a HUGE sample size. It wouldn't throw off the interclade TMRCA much anyway. I'm sorry, but I still fail to see where the systematic error lies.

Depends what you can huge. 50,000 samples would still be only 0.01% of the European population, even less if we include other Westerners of European descent + Asian R1b. It's easy to miss minor lineages isolated for a long time from the mainstream with less than 0.01% of the population sampled. What if some early U152 moved to some mountainous parts of the Balkans or Anatolia that have never been tested yet ? There could be less than 100 people left somewhere very remote from an early side lineage of the kind. Of course it all depends whether your definition of TMRCA is based on the most common recent ancestor for the group people tested only, or else for all the people alive today belonging to that subclade. I think that you are using the first definition and me the second.

sparkey
28-11-11, 06:29
Depends what you can huge. 50,000 samples would still be only 0.01% of the European population, even less if we include other Westerners of European descent + Asian R1b. It's easy to miss minor lineages isolated for a long time from the mainstream with less than 0.01% of the population sampled.

I would say that 50,000 is a huge sample size, yes, although I agree that it could be biased. A biased sample in these calculations could knock off the modal, systematically bringing the TMRCA estimate down, or, if the modal is unbiased but the variance calculation is biased (more likely to be happening here IMHO), it could systematically make the error bars too small.


Of course it all depends whether your definition of TMRCA is based on the most common recent ancestor for the group people tested only, or else for all the people alive today belonging to that subclade. I think that you are using the first definition and me the second.

OK, we were sort of talking past each other rather than having a real disagreement then... To be clear, these estimates are for the first definition of TMRCA.

Mikewww
29-11-11, 16:35
STR diversity is still instructive, even if only for positioning haplogroups in relation to each other. There is no doubt that mutations accumulate over time (number of generations.)

Busby's paper is used in some quarters to claim all STR information is bad or to avoided (boycotted.) Why throw the baby out with the bath water?

Even Busby thinks STR diversity is very important and uses it for his core counter-argument to Barlaresque. Here it is in the conclusion section.

Alternatively, if R-S127 originated prior to the Neolithic wave of expansion, then either it was already present in most of Europe before the expansion, or the mutation occurred in the east, and was spread before or after the expansion, in which case we would expect higher diversity in the east closer to the origins of agriculture, which is not what we observe.

I also find a very strange anomaly in Busby's logic. In their critical calculations, they used ten STRs across all of the R1b-S127 (L11) data to analyze STR diversity by geography. I think ten is too limiting. Ken Nordtvedt argues that each STR is its own experiment so increasing the number of STRs improves accuracy. As a former member of the U.S. National Science Board, I don't think you'll find many who can hold a candle to his grasp statistics. That's not the anomaly anyway.

Busby et al. say they selected the ten STRs based on their ability to correlate with time. Busby says this "θ(R)/2μ is an estimate of the duration of linearity". This is in "Table 1-STR theta estimates." If you cross-check duration of linearity against the ten STRs they used you'll find that six of the ten have durations of less than 5000 years. The large initial European Neolithic advances occurred roughly 7000 years ago so by their own definition their conclusions on STR diversity across Europe are not valid for arguing against Balaresque's Neolithic R1b hypothesis!
Look for yourself. No one has been able to explain this strange anomaly in Busby's paper.

Mikewww
29-11-11, 16:47
.... It wouldn't throw off the interclade TMRCA much anyway....
This is correct. This is the value of interclade TMRCA calculations. We are comparing two larger branches so missing some of the twigs on each individual branch is negated. In Generations6, Nordtvedt also used "nested variance" in the comparison of the two branches. This segregates the variance between the two and helps improve precision.

Still, all these estimates are only approximations.

Mikewww
29-11-11, 17:28
... For all we know, P312 and U106 could have existed as a stagnating tiny minority of the PIE population in the Pontic Steppes for over a thousand years before the lineage takes off. If there were only a few men, or even a few dozen men carrying those mutations, and side branches got pruned regularly by wars and diseases, the STR variance would not have evolved much. There is no way to know that based on modern STR values.....
Here is another way to look at things rather than just looking at TMRCA estimates. Modal haplotypes are not the ancestral, but they provide us information on what they might have been.
From DNA projects, we now have over 1000 111 STR length haplotypes of R-M269 deep clade tested people. Ysearch only holds 96 STRs, so I've loaded it with the following 96 marker modals:

2YYB6 R1b-L11* (S127*) Paragroup
XQJ7H R1b-P312 (S116) and all Subclades (this includes U152, L21 and Z196)
QM4ES R1b-U152 (S28) and all Subclades
K9VGV R1b-L21 (S145) and all Subclades
PEMD5 R1b-Z196 and all Subclades
N5PA5 R1b-U106 (S21) and all Subclades

If you are a genetic genealogist researching a set of families with the surname variants Richards, Ricardo, Rikkert and Rhisiart and you found these GD's at 96 markers, what would you think?

L11(S127)* to P312(S116) __ 3
L11(S127)* to U152(S28) ___ 2
L11(S127)* to L21(S145) ___ 6
L11(S127)* to Z196 ________ 6
L11(S127)* to U106(S21) ___ 5

P312(S116) to U152(S28) ___ 1
P312(S116) to L21(S145) ___ 3
P312(S116) to Z196 ________ 4
P312(S116) to U106(S21) ___ 5

L21(S145) to U152(S28) ____ 4
L21(S145) to Z196 _________ 6

U152(S28) to Z196 _________ 6

I don't know where this family started but they were closely related and apparently very assertive in expanding. Maybe that would explain the breadth of their proliferation. They were expanding over new ground quickly (with some advantage) rather than all staying at home and in-fighting.

sparkey
29-11-11, 18:53
Thanks for the informative responses, Mike. I appreciate your input on Busby in particular... that's some good analysis, including the criticism at the end, I haven't read about that before.


STR diversity is still instructive, even if only for positioning haplogroups in relation to each other. There is no doubt that mutations accumulate over time (number of generations.)

What do you think of geographic STR diversity analysis? That's not really what's going on here, but Dienekes has saved some of his sharper criticism for that in particular. Personally I think the criticism is rubbish for the reasons I outlined earlier, although I acknowledge that different circumstances can produce similar patterns unexpectedly, so our confidence in such analyses should be checked.


This is correct. This is the value of interclade TMRCA calculations. We are comparing two larger branches so missing some of the twigs on each individual branch is negated. In Generations6, Nordtvedt also used "nested variance" in the comparison of the two branches. This segregates the variance between the two and helps improve precision.

Right, I think this needs to be highlighted, and I probably didn't do that enough in my initial post. Maciamo is probably right about sample bias... but it doesn't matter in the interclade TMRCA calculation. The nested variance calculation can have its error bars narrowed more than it should with a biased sample though, right? Am I understanding that correctly?


Here is another way to look at things rather than just looking at TMRCA estimates. Modal haplotypes are not the ancestral, but they provide us information on what they might have been.

It's interesting that Nordtvedt uses modal as a byword for founding, though, or at least he has in the past. It's easy to imagine that the modal is a good approximation of the founder in most cases, at least if the calculation is done right and considers the fact that a descendant tree can be poorly balanced. But it's also possible to imagine modal calculations gone wrong. (A horror film for population geneticists?)

How high do you see the possibility for an error in the modal calculations here? I see little...

Mikewww
30-11-11, 05:10
....
It's interesting that Nordtvedt uses modal as a byword for founding, though, or at least he has in the past. It's easy to imagine that the modal is a good approximation of the founder in most cases, at least if the calculation is done right and considers the fact that a descendant tree can be poorly balanced. But it's also possible to imagine modal calculations gone wrong. (A horror film for population geneticists?)...
Nordtvedt's Generations6-1 uses modal calculations in some of the output but not, according to Ken, in the "nested variance" calculations which is the "new" part of his method. I'm not statistician enough to argue all of this. I would refer you to Ken's web site where he describes the formulas or to Rootsweb where you directly ask him yourself.

Mikewww
30-11-11, 05:29
....What do you think of geographic STR diversity analysis? That's not really what's going on here, but Dienekes has saved some of his sharper criticism for that in particular. ...
The whole idea of boycotting STR diversity analysis is going off the deep end, but it is important to keep it in perspective. It only provides approximations.

The topic of geographic STR diversity is fraught with gray areas. When we look at STR diversity within subclades we are evaluating within known clades with certain most recent common ancestors. When you start looking at STR diversity across geographies you can't be sure, that you are looking at people that are closer related within one geography versus another. Antole Klyosov calls this issue "phantom" common ancestors.

This does not mean I think all STR diversity analysis across geographies is worthless, just that you have to try to determine which geographies are origins or launch points versus crossroads or pooling points. STR diversity by geography is just another piece of data to use and cross-reference to archaeology, history, linquistic theory, etc.

... It is important to narrow the gray areas by breaking the haplotypes into the deepest subclades possible. On the other hand, subclade (haplogroup) diversity is another indicator, not just STR diversity although the latter may infer the former.

sparkey
30-11-11, 19:11
This does not mean I think all STR diversity analysis across geographies is worthless, just that you have to try to determine which geographies are origins or launch points versus crossroads or pooling points. STR diversity by geography is just another piece of data to use and cross-reference to archaeology, history, linquistic theory, etc.

Thanks. This is exactly what I've been getting at, put more concisely. I would also add that the frequency of "pooling points" has almost certainly increased since the beginning of the Modern Age.

A. Tamar Chabadi
01-12-11, 11:43
To make Dienekes' stance clearer...I am forced to somewhat defend him here...hahaha

Dienekes in his own words:
"...my opinion of Y-STRs as a tool for inferring past population movements is, to put it mildly, low. When Bahamian Y-STR variance is higher than African one, and E-V13, one of the youngest European Y-haplogroups (in terms of Y-STR variance) turns up in Spain in one of the earliest ancient DNA samples, it goes without saying that the burden of proof is on those who wish to continue to talk about Neolithic or other population movements to make the assumptions of their models clearer. Nonetheless, there is still some utility in Y-STRs..."

Furthermore, he quotes the paper that I posted as a topic here...Herrera et al. 2011:

"From the paper:

However, owing to the contentions associated with the current calibrations of the Y-STR mutation rates,32,34,35,41 as well as the limitations of the assumptions utilized by the methodologies for time estimations, the absolute dates generated in this study should only be taken as rough estimates of upper bounds.

Indeed. We are at the point where Y-STRs are at the end of their utility, but the replacement technology of extensive Y-chromosome sequencing has not quite arrived in an economical way yet."

One last thing from "his own mouth":

"And, the story has other complications. From the current paper:

[The relative expansion times for haplogroup J2-M172 (Table 4) generally correspond with those yielded for R1b-M343, with the exception of Greece and Crete, which, unlike haplogroup R1b-M343, are slightly older than the dates yielded for several of the Near Eastern groups as well as the four Armenian populations.]



As mentioned above, I don't give much weight on Y-STR evidence, but observations such as the above certainly add to the feeling of unease that something is not quite right with the default picture of prehistory."

So as we can see, Dienekes does not think there is no utility for Y-STRs, just a very limited utility.Even the authors of the paper he was discussing acknowledge the uncertainty of Y-STRs.

sparkey
01-12-11, 19:32
So as we can see, Dienekes does not think there is no utility for Y-STRs, just a very limited utility.Even the authors of the paper he was discussing acknowledge the uncertainty of Y-STRs.

Let me be clear what I'm disagreeing with then... I disagree with these:


Y-STRs are effectively dead for age estimation

We are at the point where Y-STRs are at the end of their utility

...as well as some of the criticisms he presents to geographic diversity analysis. I mean, he says that "there is still some utility in Y-STRs" but then rejects any substantive analysis using them.

Let me address some of his points, briefly:


When Bahamian Y-STR variance is higher than African one...

This is a dumb criticism of Y-STR geographic diversity analysis, as I've said already. We expect greater diversity at, as Mike puts it, a "pooling point" like the Bahamas than in an Old World population with all the genetic bottlenecks and founder effects of the past.


...and E-V13, one of the youngest European Y-haplogroups (in terms of Y-STR variance) turns up in Spain in one of the earliest ancient DNA samples...

Dienekes doesn't really understand what is known about E1b-V13 if he's calling it one "one of the youngest European Y-haplogroups." It has certain expansions which are quite young, yes, and most European E1b folk are descended from very recent E1b founders, including a very young Southeastern European founder, but it isn't really a young clade. An interesting commentary is Steve Bird on King 2011 (http://community.haplozone.net/index.php?topic=2549.30).


...it goes without saying that the burden of proof is on those who wish to continue to talk about Neolithic or other population movements to make the assumptions of their models clearer.

I can agree with him on this, though.

Mikewww
02-12-11, 04:20
So as we can see, Dienekes does not think there is no utility for Y-STRs, just a very limited utility. Even the authors of the paper he was discussing acknowledge the uncertainty of Y-STRs.
So why does he say "STRs $%Sck" ? Why does he say he is boycotting them? That's what I mean by going off the deep-end which leaves people with the wrong impression.

And by the way, if he thinks the utility for Y-STR diversity is so limited what does he propose instead? No one is saying to use STR diversity in isolation. It is just a another tool. Why throw it out? We know for sure that frequency can be very misleading in terms of origin. I know of place where I think R-L21 was very, very high, perhaps higher than Ireland. It's O'Neil, Nebraska. Does that mean L21 originated there?
Of course not, but this is the same as the argument he uses about the Bahamas, etc. This kind of count-argument is an "overwhelming exception" logical fallacy. http://en.wikipedia.org/wiki/Overwhelming_exception

We have to use these tools together. Why throw out a vice or anvil? Alone, they may not solve many problems but with a hammer and fire one can forge metal.

The whole "burden of proof" argument is another logical fallacy. It's unreasonable to expect that we can prove of much of anything (other than expansive generalities) about these things that happened 4-10K years ago. We are all smart enough we can handle ambiguity. We are looking for most likely alternatives and looking to essentially eliminate alternatives on the way.

bertrand
05-12-11, 16:59
I agree with Maciamo,
The arrival of a gene in a given region is NOT the date of birth of that gene. Given the high % of the R1-b subclasses in today's population of Europe, I would think that they were present in more than one individuals by 2500BC.

For me the most likely scenario is still the appearance of R1b right before the ice age maximum around 25,000 BP, and the appearance of the subclasses right after the end of the ice-age around 10,000 BP, as the R1b tribes lived in the caves of bashkortostan;

sparkey
05-12-11, 18:33
The arrival of a gene in a given region is NOT the date of birth of that gene.

Obviously, the first is always after the second.


Given the high % of the R1-b subclasses in today's population of Europe, I would think that they were present in more than one individuals by 2500BC.

OK, but you'll need some evidence, like a serious challenge to these TMRCA estimates or ancient DNA to prove it. "I would think" doesn't help much. Right now, 2500BC is outside the error bars for both P312 and U106, meaning that if these calculations are right, there were zero P312 or U106 carriers in 2500BC.


For me the most likely scenario is still the appearance of R1b right before the ice age maximum around 25,000 BP, and the appearance of the subclasses right after the end of the ice-age around 10,000 BP, as the R1b tribes lived in the caves of bashkortostan;

Which subclades specifically do you think came out of Bashkortostan after the Ice Age? What evidence do you have for it?

Mikewww
06-12-11, 03:08
I agree with Maciamo, The arrival of a gene in a given region is NOT the date of birth of that gene. Given the high % of the R1-b subclasses in today's population of Europe, I would think that they were present in more than one individuals by 2500BC. ...

I just wanted to clear up a couple of things on the estimations that Nordtvedt's methodology produces. The most important this is that you should go to his web site and read through his charts documenting his formulas. http://knordtvedt.home.bresnan.net/

The following quotes are from a string emails between Ken and myself. They were person to person but I don't think he'd mind me quoting him on this because they are just clarifications of his published method. Do go to his web site for to understand everything in context.

1. The modals are NOT the basis fo the interclade TMRCAs.

The modal for each clade is only for auxiliary purposes. It plays no role whatsoever in estimating the interclade node ages or the clade coalescence ages. It’s use is only for two purposes; to evaluate some sigmas and to estimate (intra)clade tmrcas which I do not consider as good as the interclade node estimates. I inserted the "(intra)" because that is what I interpret his intent to be.

2. The interclade TMRCA estimate IS for the specific "node" man that is the Most Recent Common Ancestor both of the two clades (P312 & U106 in this case.) It is NOT a coalescence age. It is estimating that one father-son event.

the interclade node age estimate is for a specific event in history. Age of the father of the two sons, each of whose descendant line leads to one clade or the other.

3. His output includes coalescence ages but they are clearly labeled as so. I interpret these ages are more akin to times of signficant expansion.


Coalescence Age for a clade is a different thing. It does not estimate time of a specific event. It is an abstract age and in words is the average tmrca of all the pairs you can form from the clade haplotype sample collection in use.

4. Don't focus too much on the single most probable age. That undoubtedly is NOT the precise date of the MRCA. It is the range that counts.

In the case example for this thread, what "U106 & P312 Nested Age___4.5 __ (5.2-3.8)" provides the range of 5.2K to 3.8K years ago. That is the one sigma range so basically Ken's methodology is saying there is a 68% chance that actual MCRA date will fall in that range. That's all it is saying. 4.5K is just the most likely part of the whole range. Most people take a range like this and use the high end, but the truth is the odds are as good it could be younger as well as older.

I know some don't want to believe young ages like these, but this just what the numbers show (and we do have a lot of numbers [long ht's] now.) The real argument is over the mutation rates. I don't see why we wouldn't use the germ-line rates that we use in genealogical calculations since Ken throws out the multi-copy STRs anyway... but this is whole area is debatable.