Why it is wrong to assume that a haplogroup originated where it is most frequent now

Maciamo · Feb 10, 2010

1. Elevated modern frequency does not equal place of origin

I have read hundreds of time people thinking that a haplogroup must probably have originated where it is most common today. It is an assumption that even professional geneticists make, and that is nevertheless often mistaken.

One famous example is Y-haplogroup R1b. Up until recently most people, amateurs and professionals alike, thought that it must be native to western Europe because it is where it is found at the highest frequencies. The Genographic Project still hasn't changed its description of it. It reads : "30,000 years ago, a descendant of the clan making its way into Europe gave rise to marker M343, then defining marker of this haplogroup. These people dominated the human expansion into Europe, the Cro-Magnons."

There are plenty of other examples. Sometimes the place of highest frequency does coincide with the region of origin. This is usually true of subclades that have developed in an isolated region, or of relatively recent mutations. The first rule is : the older the haplogroup the less likely its place of origin will coincide with the place of highest frequency.

Y-haplogroup Q is found mostly in Siberia (Altai region) and among native Americans. Judging from percentages alone it would be easy to jump to the wrong conclusion that it originated in the pre-Colombian Americas. This is actually a caricatural example as everybody knows that America was the last continent settled by humans. But the mistake made by National Geographic and plenty of others regarding R1b is just the same. That's why it is vital to look at the age of subclades and identify where the oldest version is found. In this case, Q*, the oldest form of Q, is found in Central Asia and the Middle East. This is unfortunately to wide an area to pinpoint a place of origin.

This leads to my second rule : Paleolithic people were not sedentary but moved all the time, making it pointless to try to define a small geographic region as a Paleolithic haplogroup's place of origin.

It is often in isolated regions with small population density that older versions of haplogroups survive. Apart from Q*, Central Asia is a great region for preserving haplogroups that have disappeared about anywhere else (e.g. O*, P*, K*). The Caucasus is another good example. Its high mountains have isolated ethnic groups from one another for millennia. Unsurprisingly it is almost the only region where haplogroup F, a haplogroup that originated some 50,000 years ago, can still be found in high, and indeed sometimes very high frequencies. Isolation and the near absence of population inflow from the outside are one factor, but the small size of mountain populations is another, slowing down the mutation rate that create new haplogroups. It is not because F is found almost exclusively in the Caucasus nowadays that it originated there. I just survived there in its oldest form, but evolved elsewhere. 90% of the world population descends from F*. That's why it is important not to confuse modern distribution and place of origin. F almost certainly did not originate in the Caucasus, otherwise it would have remained stuck there rather than spreading to the whole world.

It is banal to refer to the southern Arabian peninsula as the place of origin of J1. After all, over 70% of men in Yemen belong to that haplogroup. Things are never as simple as they look at first sight. We have to ask : why did this haplogroup become dominant in that region and not another one also found there ? It could have evolved from a small group of original settlers into a virgin region. If the other haplogroups represent later immigrants, the first haplogroup present would have remained the dominant one, unless the new migrants came in huge numbers or killed the original inhabitants.

But how can we know if J1 in Yemen arrived first in an empty place or if it replaced indigenous haplogroups ? R1b did replace most of the older haplogroups in western Europe and so did O in East Asia (the aboriginal East Asians belonging to C and D).

Even if Yemen was uninhabited before J1 or all the older lineages became extinguished, how can we know if the people who arrive were already J1 or were J* that later developed into J1 in Yemen, then re-expanded northward ? To answer this question geneticists will usually analyse the genetic diversity of various regions. Theoretically, the place where J1 originated would be the one where the most subclades and STR variance can be observed. As we will now see the theory always seem easier than it actually is in practice.

2. Genetic diversity does not equal place of origin

Another common mistake is to think that a haplogroup's place of origin corresponds to the area where it has the greater genetic diversity (e.g. microsatellite diversity, number of subclades). For example, if region A has 10 different subclades for a haplogroup, with a converging age going back 15,000 years, but region B has only 2 subclades and a TMRCA going back only 8,000 years, then region A is more likely to be the place of origin. The concept is attractive, but unfortunately too simple. It doesn't take into account two essential factors : 1) the population size of a region and 2) the region's history (invasions, migrations, genocides).

First, one has to consider the historical and present population size of regions studied. Mutations, and therefore genetic diversity, happen 100 times more frequently in a population of one million individuals than among 10,000 persons. Take 200 men belonging to the same Y-haplogroup, divide them in two groups, one in an unfriendly environment with little food (e.g. Siberia) and the other in a pleasant climate favourable to agriculture (e.g. India). Let's say that after a thousand years the first group will have 1,000 descendants carrying the same haplogroup, while the second will have 100,000 descendants. Population grow has be constant with no major war, famine or epidemics causing a population bottleneck in between. In this theoretical example (because wars, famines and epidemics do happen) it is easy to see why the second group should have a much greater genetic diversity than the first one after one thousand years, although they both descend from the same original lineage !

This is probably what happened with such haplogroups as R1a1a (Y-DNA) and U2 (mtDNA) in India, as opposed to their likely place of origin in the Eurasian steppe. The same thing happened with haplogroup O, which most probably originated in Central Asia, but gained a great diversity in the much more fertile lands of East and South-East Asia.

The second fundamental point that should never be overlooked is regional history. How often was a region invaded ? Was it settled just once like in Iceland or Polynesia, or was it constantly overrun by nomadic neighbours like in the Balkans, the Middle East, or northern China ? Did people migrate in mass to other places like Germanic tribes in the 5th century, or was it a place where people came to settle like Italy or Anatolia ? Did invaders massacre the locals that they saw as inferior, or did they just take over power as new rulers of a well-established kingdom/empire that they regarded as superior to their own culture ? Did some kind of apartheid happen between more developed newcomers and more primitive indigenes, as was probably the case in Europe between Mesolithic hunter-gatherers and Neolithic farmers/herders ?

These are all essential questions that one should ask when studying population genetics. Unfortunately they are typically the least considered elements by professional geneticists, who tend to have a very poor background in history and archaeology.

Regions that were seen as attractive by nomads to plunder, conquer or resettle to, will undoubtedly have inherited from some of these invaders' haplogroups. In Europe and the Middle East the most advanced societies from the Neolithic until the Renaissance (c. 8500 BCE to 1500 CE, so a period of 10,000 years) was between Mesopotamia and the Balkans (+ Italy from the heydays of the Roman Empire). This region also maintained the largest populations in the biggest cities outside India and China.

Furthermore, the biggest reservoir for nomadic incursions was just across the Caucasus, the huge Eurasian steppe, ranging from the Danube estuary to Central Asia and Mongolia. Mesopotamia has the world's longest recorded history, and it is but a succession of invasion from steppe people, be them Indo-European, Mongolian or Turkic. It is therefore unsurprising that a high level of genetic diversity from steppe haplogroups (such as R1a1a, and probably also R1b1b) should be found between Mesopotamia and the Balkans. Note that Egypt was better preserved from these invasions thanks to its distance and geographic isolation.

Some geneticists have argued for the Middle East or the Balkans as the place of origin of R1a1a or R1b1b based on the genetic diversity found in those regions (the Balkans for R1a1a and the Levant or Mesopotamia for R1b1b). This is however likely to be just the results of millennia of steppe invasions. The same phenomenon can be observed in India/Pakistan with R1a1a. How better can we explain the great genetic diversity of R1a1a in such distant places as the Balkans and the Indian subcontinent if not as a result of waves of migrations by different steppe people from the Bronze Age onwards ? Add to this that the migrants would have had more offspring in the newly conquered fertile lands than in their native steppes, and that explains it all.

Segia · Feb 10, 2010

Some questions:

How was the demography of the steppes when compared to urban neolithic societies? How big was the impact of hunnic and mongol invasions at a genetic level?
How can be stablished the place of origin of a mutation?

LeBrok · Feb 10, 2010

I like your logic Maciamo, it makes good sense.

Maciamo · Feb 10, 2010

Segia said:
Some questions:
How was the demography of the steppes when compared to urban neolithic societies? How big was the impact of hunnic and mongol invasions at a genetic level?
How can be stablished the place of origin of a mutation?

The Huns and Mongols had a pretty minor impact on the European population because they arrived late, when Europe's population was already well established in cities and towns. The Mongols had a much bigger impact on Central Asia (especially Kazakhstan, were Mongols now constitute over half of the population), northern Iran and Afghanistan (e.g. Hazara people).

Those who had the biggest impact were the earliest, i.e. the Bronze-Age invaders. For a more detailed explanation, please read the R1b and R1a sections on this website.

St Delcambre · Feb 13, 2010

Very interesting read.

rogers · Mar 12, 2010

Take for example y chromosome haplogroup I*.

Haplogroup I probably originated in the West Balkans (Croatia-Bosnia). The highest frequency > 70% of this haplogroup is still found in the same region of Europe but it is now a subclade I2a2.

There in fact was a recent migration of I2a2 individuals from some where in Northern Europe to this area as it is very young 3,000 years old. So looks like there was a back-migration of the haplogroup I into Croatia-Bosnia that only happened recently.

willy · Mar 18, 2010

Maciamo said:
The same phenomenon can be observed in India/Pakistan with R1a1a. How better can we explain the great genetic diversity of R1a1a in such distant places as the Balkans and the Indian subcontinent if not as a result of waves of migrations by different steppe people from the Bronze Age onwards ? Add to this that the migrants would have had more offspring in the newly conquered fertile lands than in their native steppes, and that explains it all.

first : where is this "Eurasian steppe" for you ?

second : R1a1a in India is not so simple as you think .

third : there is a very big confusion between haplogroups, indo europeans, times and migrations !

Joro · Mar 19, 2010

rogers said:
Take for example y chromosome haplogroup I*.

Haplogroup I probably originated in the West Balkans (Croatia-Bosnia). The highest frequency > 70% of this haplogroup is still found in the same region of Europe but it is now a subclade I2a2.

There in fact was a recent migration of I2a2 individuals from some where in Northern Europe to this area as it is very young 3,000 years old. So looks like there was a back-migration of the haplogroup I into Croatia-Bosnia that only happened recently.

Maybe that could explain ~10% of I1 in Croatia?Those back migrants?

rogers · Apr 25, 2010

Joro said:
Maybe that could explain ~10% of I1 in Croatia?Those back migrants?

Yes, I think that this theory fits the evidence best.

MOESAN · Dec 4, 2011

rogers said:
Take for example y chromosome haplogroup I*.

Haplogroup I probably originated in the West Balkans (Croatia-Bosnia). The highest frequency > 70% of this haplogroup is still found in the same region of Europe but it is now a subclade I2a2.

There in fact was a recent migration of I2a2 individuals from some where in Northern Europe to this area as it is very young 3,000 years old. So looks like there was a back-migration of the haplogroup I into Croatia-Bosnia that only happened recently.

Sorry I take on very late in this thread (very intesting thread speaking not only about data but about way of thinking (too few people do it: we rely very too much on scholars and scientists even when they show intellectual antagonism and agendas)

for Y-I I'm not sure at all it was born in North-West Balkans or Dinaric Alps: in the LGM for I onow it has been discovered very few remnants of human settlements: what seamsis that the most of the people outside Western or Far eastern Europe-EAsia was either in Czechosloavkia, South Poland or Carpathes - in the strongst of LGM I bet Y-I1 was between Moravia and Poland, maybe further East in Bela-Russia or Russia (some scholars say a part or I1 of Finnland could have been come from South the Baltic Sea and not only fromEastern Scandinvaia), and the ancestors of Y-I2a1a (ex-I2a2) was between Carpathes and Moldavia (a lot of maybe's I know but how being more affirmative?) - I2a1a (ex I2a1) could have been between Bohem and future Veneto (N-E Italy) beofre sail to Spain and Sardinia? -
I should see very well an old enough presence of Y-E1b V18 in the Dinaric Alps before a first infiltration of I2a2b from say the Carpathes or Moldavia (growing power of Cucuteni-Tripolje?) at the end of Eneolithic-begining of Chalcolithic (maybe with mixed Y-E1b) and a second infiltration (rush?) or I2a2 mixed with a lot of R1a (and some other minour HG's) at the Slavic period

BUT WHAT I SAY IS THAT DINARIC ALPS AND EVEN GREECE SHOW A POOR EVIDENCE OF PRE-NEOLITHIC SETTLEMENTS (maybe some new surveys can contredict that? I'm not aware)-
I confirm all of that is betting yet

MOESAN · Dec 4, 2011

Joro said:
Maybe that could explain ~10% of I1 in Croatia?Those back migrants?

I prefer to see the most or Y-I1 bearers coming with different 'germanic' historic events in the Balkans

MOESAN · Dec 4, 2011

MOESAN said:
Sorry I take on very late in this thread (very intesting thread speaking not only about data but about way of thinking (too few people do it: we rely very too much on scholars and scientists even when they show intellectual antagonism and agendas)

for Y-I I'm not sure at all it was born in North-West Balkans or Dinaric Alps: in the LGM for I onow it has been discovered very few remnants of human settlements: what seamsis that the most of the people outside Western or Far eastern Europe-EAsia was either in Czechosloavkia, South Poland or Carpathes - in the strongst of LGM I bet Y-I1 was between Moravia and Poland, maybe further East in Bela-Russia or Russia (some scholars say a part or I1 of Finnland could have been come from South the Baltic Sea and not only fromEastern Scandinvaia), and the ancestors of Y-I2a1a (ex-I2a2) was between Carpathes and Moldavia (a lot of maybe's I know but how being more affirmative?) - I2a1a (ex I2a1) could have been between Bohem and future Veneto (N-E Italy) beofre sail to Spain and Sardinia? -
I should see very well an old enough presence of Y-E1b V18 in the Dinaric Alps before a first infiltration of I2a2b from say the Carpathes or Moldavia (growing power of Cucuteni-Tripolje?) at the end of Eneolithic-begining of Chalcolithic (maybe with mixed Y-E1b) and a second infiltration (rush?) or I2a2 mixed with a lot of R1a (and some other minour HG's) at the Slavic period

BUT WHAT I SAY IS THAT DINARIC ALPS AND EVEN GREECE SHOW A POOR EVIDENCE OF PRE-NEOLITHIC SETTLEMENTS (maybe some new surveys can contredict that? I'm not aware)-
I confirm all of that is betting yet

I add that the I2a1b in is craddle was a bunch of few Y-I2* 's descendants that remained a long time within a scarce population - that could explain its low variance and apparent "young age" - it' s not ridiculous imagine they knowed a first demographic 'bomm' after the learning of agriculture... some other Y-I2* descendants could have wintered further West (Slovakia? Boheme too) in a slightly bigger number, being the ancestors of the I2a2-Isles??? (and relatives too to the I2a1a)

MOESAN · Dec 4, 2011

rogers said:
Take for example y chromosome haplogroup I*.

Haplogroup I probably originated in the West Balkans (Croatia-Bosnia). The highest frequency > 70% of this haplogroup is still found in the same region of Europe but it is now a subclade I2a2.

There in fact was a recent migration of I2a2 individuals from some where in Northern Europe to this area as it is very young 3,000 years old. So looks like there was a back-migration of the haplogroup I into Croatia-Bosnia that only happened recently.

Sorry I take on very late in this thread (very intesting thread speaking not only about data but about way of thinking (too few people do it: we rely very too much on scholars and scientists even when they show intellectual antagonism and agendas)

for Y-I I'm not sure at all it was born in North-West Balkans or Dinaric Alps: in the LGM for I onow it has been discovered very few remnants of human settlements: what seamsis that the most of the people outside Western or Far eastern Europe-EAsia was either in Czechosloavkia, South Poland or Carpathes - in the strongst of LGM I bet Y-I1 was between Moravia and Poland, maybe further East in Bela-Russia or Russia (some scholars say a part or I1 of Finnland could have been come from South the Baltic Sea and not only fromEastern Scandinvaia), and the ancestors of Y-I2a1a (ex-I2a2) was between Carpathes and Moldavia (a lot of maybe's I know but how being more affirmative?) - I2a1a (ex I2a1) could have been between Bohem and future Veneto (N-E Italy) beofre sail to Spain and Sardinia? -
I should see very well an old enough presence of Y-E1b V18 in the Dinaric Alps before a first infiltration of I2a2b from say the Carpathes or Moldavia (growing power of Cucuteni-Tripolje?) at the end of Eneolithic-begining of Chalcolithic (maybe with mixed Y-E1b) and a second infiltration (rush?) or I2a2 mixed with a lot of R1a (and some other minour HG's) at the Slavic period

BUT WHAT I SAY IS THAT DINARIC ALPS AND EVEN GREECE SHOW A POOR EVIDENCE OF PRE-NEOLITHIC SETTLEMENTS (maybe some new surveys can contredict that? I'm not aware)-
I confirm all of that is betting yet

MOESAN · Dec 4, 2011

and the ancestors of Y-I2a1a (ex-I2a2) was between Carpathes and Moldavia (a lot of maybe's I know but how being more affirmative?) - I2a1a (ex I2a1) could have been between Bohem and future Veneto (N-E Italy) beofre sail to Spain and Sardinia? -
I should see very well an old enough presence of Y-E1b V18 in the Dinaric Alps before a first infiltration of I2a2b from say the Carpathes or Moldavia (growing power of Cucuteni-Tripolje?) at the end of Eneolithic-begining of Chalcolithic (maybe with mixed Y-E1b) and a second infiltration (rush?) or I2a2 mixed with a lot of R1a (and some other minour HG's) at the Slavic period

BUT WHAT I SAY IS THAT DINARIC ALPS AND EVEN GREECE SHOW A POOR EVIDENCE OF PRE-NEOLITHIC SETTLEMENTS (maybe some new surveys can contredict that? I'm not aware)-
I confirm all of that is betting yet[/QUOTE]

I add that the I2a1b descendants of Y-I2* in the cradle area should have been a very scarce population a long time before knowing a 'baby boom' with the learning of agriculture - that could esplain their "young age" and low variations - they could have their cousins I2* >> I2a1b -Isles further West in Czechoslovakia, within a slightly more numerous population but it's just a light feeling -

Knovas · Dec 4, 2011

I2a1a* (the one you say ex I2a1*), it's not probable to originate in North-East Italy. The most likely and accepted place right now is the Pyrenees. From there, the expansion to other regions, including Sardinia, makes much more sense.

sparkey · Dec 5, 2011

MOESAN said:
I add that the I2a1b in is craddle was a bunch of few Y-I2* 's descendants that remained a long time within a scarce population - that could explain its low variance and apparent "young age" - it' s not ridiculous imagine they knowed a first demographic 'bomm' after the learning of agriculture... some other Y-I2* descendants could have wintered further West (Slovakia? Boheme too) in a slightly bigger number, being the ancestors of the I2a2-Isles??? (and relatives too to the I2a1a)

The defining SNPs for I2a1b (L178, etc.) probably all had arisen by 12,000 years ago, before agriculture had spread to the area, and we don't really have any clue where the MRCA of I2a1b lived, owing to its weird distribution that sees it split between Northwestern Europe (especially Britain) and Eastern Europe. Nordtvedt's wild guess on his map places it around Poland, but obviously we can't be sure. Its TMRCA dating, other than maybe that of its I2a1b2-Isles subclade, is not as favorable to an apparent agricultural boom as some other I subclades, most namely I2a1a. I2a1b1a-Din in particular is much too young to be connected to agriculture.

My Paleolithic Remnants map is intended to give some idea of where the modern centers of diversity are for subclades that arose by 6000 years ago or so (basically, by the end of the arrival of the Neolithic in Europe). Perhaps it helps here.

MOESAN · Dec 9, 2011

sparkey said:
The defining SNPs for I2a1b (L178, etc.) probably all had arisen by 12,000 years ago, before agriculture had spread to the area, and we don't really have any clue where the MRCA of I2a1b lived, owing to its weird distribution that sees it split between Northwestern Europe (especially Britain) and Eastern Europe. Nordtvedt's wild guess on his map places it around Poland, but obviously we can't be sure. Its TMRCA dating, other than maybe that of its I2a1b2-Isles subclade, is not as favorable to an apparent agricultural boom as some other I subclades, most namely I2a1a. I2a1b1a-Din in particular is much too young to be connected to agriculture.

My Paleolithic Remnants map is intended to give some idea of where the modern centers of diversity are for subclades that arose by 6000 years ago or so (basically, by the end of the arrival of the Neolithic in Europe). Perhaps it helps here.

I red your Paleolithic remnants map and I found it interesting
your thoughts about the present day places and the origin places of HG's high diversity centers are close to mines
and just for that, as I know people don't have stayed always in the same place , I think Y-I2a1a close ancestors Y-I2a1* and I2a* could have taken a Center-European to N-E italian way before arriving in Iberia (by sea? by France more surely) - difficult to explain presence of old Y-I2a1* in N-E Italia in another way - all that could have occurred long enough before néolithic agriculture...

sparkey · Dec 9, 2011

MOESAN said:
I red your Paleolithic remnants map and I found it interesting
your thoughts about the present day places and the origin places of HG's high diversity centers are close to mines
and just for that, as I know people don't have stayed always in the same place , I think Y-I2a1a close ancestors Y-I2a1* and I2a* could have taken a Center-European to N-E italian way before arriving in Iberia (by sea? by France more surely) - difficult to explain presence of old Y-I2a1* in N-E Italia in another way - all that could have occurred long enough before néolithic agriculture...

I2a1* is very very old, nearly 20,000 years old per Nordtvedt's latest estimates. Its location and movement so long ago is too tough to guess right now IMHO. One thing that simplifies the "Y-I2a1* in N-E Italia" is that the only I2a1 clade that seem to be ancient there is the ~4000 year old I2a1c*-Alpine clade, which has a fairly close relative, I2a1c1-Western, which also has a center of diversity not too far from the Rhine. That makes me think that the haplogroup mixture along the Rhine just before the beginning of the Neolithic in the area was a pretty good mix of I2a1c, I2a2b, and I2c. As for I2a1a, who knows... it's about 18,000 years from I2a1c. I guess it would have just been a normal East-West walking migration.

MOESAN · Dec 11, 2011

sparkey said:
I2a1* is very very old, nearly 20,000 years old per Nordtvedt's latest estimates. Its location and movement so long ago is too tough to guess right now IMHO. One thing that simplifies the "Y-I2a1* in N-E Italia" is that the only I2a1 clade that seem to be ancient there is the ~4000 year old I2a1c*-Alpine clade, which has a fairly close relative, I2a1c1-Western, which also has a center of diversity not too far from the Rhine. That makes me think that the haplogroup mixture along the Rhine just before the beginning of the Neolithic in the area was a pretty good mix of I2a1c, I2a2b, and I2c. As for I2a1a, who knows... it's about 18,000 years from I2a1c. I guess it would have just been a normal East-West walking migration.

It seams very well informeds answer
I'm not up-to-date for these clades of I2a1 - I'll go fishing

Beowuld · Jan 10, 2012

Thank you for sharing your enormous and recent knowledge.

I am new here, but I am very interested in this topic. I've read some books by Gwyn Jones, JP Mallory (excellent), Cavalli-Sforza, and the books by B Sykes. I found Sykes' books somewhat unconvicing (especially Blood of the Isles). Since I'm not an expert I could not tell why. The data you have written about R1b nicely undermine Syke's book, and gives me reason to discredit his findings.

I have some questions to Maciamo:

1. Cavalli-Sforza and Sykes guesstimated about 20-30% or so of DNA being Indoeuropean (IE) in Europe. I guess these figures must be upcalculated for Y chromosomes now, although they would be more correct for mtDNA. What's your "guestimate" for Europe today?

2. I was intrigued by the preIE haplo I in Scandinavia and how you told that this explains a preIE migration from Sweden to Finland.

3. Have you read "The 10,000 Year Explosion"? The authors of that book argue that the spread of IE was vastly exacerbated by the lactase mutation which they believe arose in the protoIE people of the steppe. Then, there arises the question why this mutation would be most prevalent in the some of the least genetically IE people of Europe, the Scandinavians. I guess the explanation is that genes that have a procreational advantage don't follow the same patterns as silent mutations in Y chromosomes. What do you think?

4. Are there any new books which encompass the new knowledge you have written about?
Will you write this book?
Are population genetics your profession?

Why it is wrong to assume that a haplogroup originated where it is most frequent now

Veteran member

Regular Member

Elite member

Veteran member

Regular Member

Regular Member

Junior Member

Regular Member

Regular Member

Elite member

Elite member

Elite member

Elite member

Elite member

Regular Member

Great Adventurer

Elite member

Great Adventurer

Elite member

Junior Member