Paleolithic mammoth hunters
Haplogroup R* originated in North Asia just before the Last Glacial Maximum (26,500-19,000 years before present). This haplogroup has been identified in the 24,000 year-old remains of the so-called "Mal'ta boy" from the Altai region, in south-central Siberia (Raghavan et al. 2013). This individual belonged to a tribe of mammoth hunters that may have roamed across Siberia and parts of Europe during the Paleolithic. Autosomally this Paleolithic population appears to have contributed mostly to the ancestry of modern Europeans and South Asians, the two regions where haplogroup R also happens to be the most common nowadays (R1b in Western Europe, R1a in Eastern Europe, Central and South Asia, and R2 in South Asia).
Haplogroup R1a probably branched off from R1* during or soon after the Last Glacial Maxium. Little is know for certain about its place of origin. Some think it might have originated in the Balkans or around Pakistan and Northwest India, due to the greater genetic diversity found in these regions. The diversity can be explained by other factors though. The Balkans have been subject to 5000 years of migrations from the Eurasian Steppes, each bringing new varieties of R1a. South Asia has had a much bigger population than any other parts of the world (occasionally equalled by China) for at least 10,000 years, and larger population bring about more genetic diversity. The most likely place of origin of R1a is Central Asia or southern Russia/Siberia.
From there, R1a could have migrated directly to eastern Europe (European Russia, Ukraine, Belarus), or first southward through Central Asia and Iran. In that latter scenario, R1a would have crossed the Caucasus during the Neolithic, alongside R1b, to colonise the Pontic-Caspian Steppe. In the absence of ancient Y-DNA from those regions the best evidence supporting a Late Paleolithic migration to Iran is the presence of very old subclades of R1a (like M420) in the region, notably in the Zagros mountains. However these samples only make up a fraction of all R1a in the region and could just as well represent the descendants of Eastern European hunter-gatherers who branched off from other R1a tribes and crossed from the North Caucasus any time between 20,000 and 8,000 years ago. The logic behind this is that most known historical migrations in Eurasia took place from north to south, as people sought warmer climes. The only exception happened during the Holocene warming up of the climate, which corresponds to the Neolithic colonisation of Europe from the Near East. A third possibility is that R1a tribes split in two around Kazakhstan during the Late Paleolithic, with one group moving to eastern Europe, while the other moved south to Iran.
|Did R1a come to Europe with Neolithic farmers ?|
Some people have theorized that R1a was one of the lineages of the Neolithic farmers, and would have entered Europe through Anatolia, then spread across the Balkans toward Central Europe, then only to Eastern Europe. There are many issues with this scenario. The first is that 99% of modern R1a descends from R1a1a (M417), a subclade that clearly expanded from the Bronze Age onwards, not from the early Neolithic. Its phylogeny also points at an Eastern European origin. Secondly, most of the R1a in Middle East are deep subclades of the R1a-Z93 branch, which originated in Russia (see below). It could not have been ancestral to Balkanic or Central European R1a. Thirdly, there is a very strong correlation between the Northeast European autosomal admixture and R1a populations, and this component is missing from the genome of all European Neolithic farmers tested to date - even from Ítzi, who was a Chalcolithic farmer. This admixture is also missing from modern Sardinians, who are mostly descended from Neolithic farmers. This is incotrovertible evidence that R1a did not come to Europe with Neolithic farmers.
Bronze Age Proto-Indo-Europeans
R1a is thought to have been the dominant haplogroup among the northern and eastern Proto-Indo-European language speakers, that evolved into the Indo-Iranian, Thracian, Baltic and Slavic branches. The Proto-Indo-Europeans originated in the Yamna culture (3300-2500 BCE). Their dramatic expansion was possible thanks to an early adoption of bronze weapons and the domestication of the horse in the Eurasian steppes (circa 4000-3500 BCE). The southern Steppe culture is believed to have carried predominantly R1b (M269 and M73) lineages, while the northern forest-steppe culture would have been essentially R1a-dominant. The first expansion of the forest-steppe people occured with the Corded Ware Culture (see Germanic branch below). The migration of the R1b people to central and Western Europe left a vacuum for R1a people in the southern steppe around the time of the Catacomb culture (2800-2200 BCE). The forest-steppe origin of this culture is obvious from the introduction of corded pottery and the abundant use of polished battle axes, the two most prominent features of the Corded Ware culture. This is also probably when the satemisation process of the Indo-European languages began since the Balto-Slavic and Indo-Iranian language groups belong to the same Satem isogloss and both appear to have evolved from the the Catacomb culture.
Ancient DNA testing has confirmed the presence of haplogroup R1a1a in samples from the Corded Ware culture in Germany (2600 BCE), from Tocharian mummies (2000 BCE) in Northwest China, from Kurgan burials (circa 1600 BCE) from the Andronovo culture in southern Russia and southern Siberia, as well as from a variety of Iron-age sites from Russia, Siberia, Mongolia and Central Asia.
Distribution of haplogroup R1a in Europe
Distribution of haplogroup R1a-M458 in Europe
Distribution of haplogroup R1a-M558 (CTS1211) in Europe
Distribution of haplogroup R1a-Z93 in Eurasia
Nowadays, high frequencies of R1a are found in Poland (57.5% of the population), Ukraine (40 to 65%), European Russia (45 to 65%), Belarus (51%), Slovakia (42%), Latvia (40%), Lithuania (38%), the Czech Republic (34%), Hungary (32%), Norway (27%), Austria (26%), Croatia (24%), north-east Germany (24%) Sweden (19%), and Romania (18%).
Subclades & Haplotypes
99% R1a people belong to subclades of R1a1a1 (R1a-M417), which is divided in the following subclades:
- R1a-L664 is essentially Northwest European, found chiefly in West Germany, the Low Countries and the British Isles.
- R1a-Z645 makes up the bunch of R1a individuals from Central Europe to South Asia.
- R1a-Z283 is the main Central & East European branch.
- R1a-Z284 is a Scandinavian subclade with an epicentre in Norway. It is found also in places colonised by the Norwegian Vikings, like some parts of Scotland, England and Ireland. Several subclades were identified, including L448, L176.1, Z287/Z288, Z66 and Z281 about which little is known at the moment.
- R1a-M458, primarily a Slavic subclade, with maximum frequencies in Poland, the Czech Republic, Slovakia, but is also fairly common in southeast Ukraine and northwest Russia.
- its subclade R1a-L260 is clearly West Slavic, with a peak of frequency in Poland, the Czech Republic and Slovakia, and radiating at lower frequencies into East Germany, East Austria, Slovenia and Hungary.
- R1a-Z280 is also an Balto-Slavic marker, found all over central and Eastern Europe (except in the Balkans), with a western limit running from East to south-west Germany and to Northeast Italy. It can be divided in many clusters: East Slavic, Baltic, Pomeranian, Polish, Carpathian, East-Alpine, Czechoslovak, and so on.
- its subclade R1a-L365 is a Pomeranian cluster found also in southern Poland.
- R1a-Z93 is the main Asian branch of R1a. It is found in Central Asia, South Asia and Southwest Asia (including among Ashkenazi Jews). R1a-Z93 is the marker of historical peoples such as the Indo-Aryans, Persians, Medes, Mitanni, or Tatars, and pervaded the genetic pool of the Arabs and Jews.
- its subclade R1a-M434 makes up a small percentage of the population of Pakistan. Traces have also been found in Oman.
A lot of Western and Northern European R1a that is negative for the marker Z284 falls under the root R1a1a1* (M417), or even in the older R1a1a (M17) and R1a1 (SRY10831.2). The former are descended from the oldest known expansion of R1a out of the Forest-Steppe, the Corded Ware Culture (see below), which predates all the above subclades. At present no subclade has been identified by a common SNP. However, Klyosov et al. (2009) found that a substantial percentage of R1a in Northwest Europe, particularly in Norway, England, Ireland and Iceland, had a repeat value of 10 (instead of 12) at the STR marker DYS388. Among them, some individuals were identified as carrying the mutation L664. The origin of the older subclades (M17 and SRY10831.2) is still unclear (perhaps Mesolithic hunter-gatherers roaming around Europe).
History of R1a
The Germanic branch
The first major expansion of R1a took place with the westward propagation of the Corded Ware (or Battle Axe) culture (2800-1800 BCE) from the northern forest-steppe in the Yamna homeland. This was the first wave of R1a into Europe, the one that brought the Z283 subclade to Germany and the Netherlands, and Z284 to Scandinavia. The Corded Ware R1a people would have mixed with the pre-Germanic I1 and I2 aborigines, which resulted in the first Indo-European culture in Germany and Scandinavia, although that culture could not be considered Proto-Germanic - it was simply Proto-Indo-European at that stage, or perhaps or Proto-Balto-Slavic.
Germanic languages probably did not appear before the Nordic Bronze Age (1800-500 BCE). Proto-Germanic language probably developed as a blend of two branches of Indo-European languages, namely the Proto-Balto-Slavic language of the Corded-Ware culture (R1a-Z283) and the later arrival of Proto-Italo-Celto-Germanic people from the Unetice culture (R1b-L11). This is supported by the fact that Germanic people are a R1a-R1b hybrid, that these two haplogroups came via separate routes at different times, and that Proto-Germanic language is closest to Proto-Italo-Celtic, but also shares similarities with Proto-Slavic.
The R1b branch of the Indo-Europeans is thought to have originated in the southern Yamna culture (northern shores of the Black Sea). It was the first one to move from the steppes to Europe, invading the Danube delta around 4200 BCE, then making its way around the Balkans and the Hungarian plain in the 4th millennium BCE. It is likely that a minority of R1a people accompanied this R1b migration. Those R1a men would have belonged to the L664 subclade, the first to split from the Yamna core. These early steppe invaders were not a homogeneous group, but a cluster of tribes. It is possible that the R1a-L664 people were one or several separate tribes of their own, or that they mixed with some R1b lineages, notably R1b-U106, which would become the main Germanic lineage many centuries later. The R1b-R1a contingent moved up the Danube to the Panonian plain around 2800 BCE, brought to an end the local Bell Beaker (circa 2200 BCE) and Corded Ware (c. 2400 BCE) cultures in Central Europe, and set up the Unetice culture (2300-1600 BCE) around Bohemia and eastern Germany. Unetice can be seen as the source of future Germanic, Celtic and Italic cultures, and is associated with the L11 subclade of R1b.
The late Unetice culture expanded to Scandinavia, founding the Nordic Bronze Age. R1a-L664 and R1b (L11 and U106) presumably reached Scandinavia at this time. People from the Nordic Bronze Age probably spoke a Proto-Germanic language, which for over a thousand years acquired vocabulary from the indigenous Corded Ware language, itself a mixture of Proto-Balto-Slavic and non-IE pre-Germanic. The first genuine Germanic tongue has been estimated by linguists to have come into existence around (or after) 500 BCE, just as the Nordic Bronze Age came to an end, giving way to the Pre-Roman Iron Age. The uniqueness of some of the Germanic vocabulary points at borrowing from native pre-Indo-European languages (Germanic substrate theory). The Celtic language itself is known to have borrowed from Afro-Asiatic languages spoken by Near-Eastern immigrants to Central Europe. The fact that present-day Scandinavia is composed of roughly 40% of I1, 20% of R1a and 40% of R1b reinforces the idea that the Germanic ethnicity and language had acquired a tri-hybrid character by the Iron Age.
The Baltic branch
The Baltic branch is thought to have evolved from the Fatyanovo culture (3200-2300 BCE), the northeastern extension of the Corded Ware culture. Early Bronze Age R1a nomads from the northern steppes and forest-steppes would have mixed with the indigenous Uralic-speaking inhabitants (N1c1 lineages) of the region. This is supported by a strong presence of both R1a and N1c1 haplogroups from southern Finland to Lithuania and the adjacent part of Russia.
The Slavic branch
The origins of the Slavs go back to circa 3500 BCE with the northern Yamna culture. The M412 and Z280 lineages spread around Poland, Belarus, Ukraine and western Russia, and would form the core of the Proto-Slavic culture. The high prevalence of R1a in Balto-Slavic countries nowadays is not only due to the Corded Ware expansion, but also to a long succession of later migrations from Russia, the last of which took place from the 5th to the 1th century CE. The Slavic branch differentiated itself when the Corded Ware culture absorbed the Cucuteni-Tripolye culture (5200-2600 BCE) of western Ukraine and north-eastern Romania, which appears to have been composed primarily of I2a1b (M423) lineages descended directly from Paleolithic Europeans, with a small admixture of Near-Eastern immigrants (notably E1b1b, G2a, J and T). Thus emerged the hybrid Globular Amphora culture (3400-2800 BCE) in what is now Ukraine, Belarus and Poland. It is surely during this period that I2a2, E-V13 and T spread (along with R1a) around Poland, Belarus and western Russia, explaining why eastern and northern Slavs (and Lithuanians) have between 10 and 20% of I2a1b lineages and about 10% of Middle Eastern lineages (18% for Ukrainians). After just a few centuries, this hybridised culture faded away into the dominant Corded Ware (2800-1800 BCE) and Catacomb (2800-1800 BCE) cultures.
The Corded Ware period was followed in the steppes by the Srubna culture (1800-1200 BCE), and around Poland by the Trzciniec culture (1700-1200 BCE). The last important Slavic migration is thought to have happened in the 6th century CE, from Ukraine to Poland, the Czech Republic and Slovakia, filling the vacuum left by eastern Germanic tribes who invaded the Roman Empire.
Historically, no other part of Europe was invaded a higher number of times by steppe peoples than the Balkans. Chronologically, the first R1a invaders came with the westward expansion of the Yamna culture (from 4200 BCE), a succession of steppe migrations that lasted about 2000 years. Then came the Thracians (1500 BCE), followed by the Illyrians (around 1200 BCE), the Huns and the Alans (400 CE), the Avars, the Bulgars and the Serbs (all around 600 CE), and the Magyars (900 CE), among others. These peoples originated from different parts of the Eurasian steppes, anywhere between Eastern Europe and Central Asia, which is why such high STR diversity is found within Balkanic R1a nowadays. It is not yet possible to determine the ethnic origin for each variety of R1a, apart from the fact that about any R1a is associated with tribes from Eurasian steppe at one point in history.
Migration map of haplogroup R1a from the Neolithic to the late Bronze Age (c. 1000 BCE)
Click to enlarge.
The Indo-Iranian branch
Proto-Indo-Iranian speakers, the people who later called themselves 'Aryans' in the Rig Veda and the Avesta, originated in the Sintashta-Petrovka culture (2100-1750 BCE), in the Tobol and Ishim valleys, east of the Ural Mountains. It was founded by pastoralist nomads from the Abashevo culture (2500-1900 BCE), ranging from the upper Don-Volga to the Ural Mountains, and the Poltavka culture (2700-2100 BCE), extending from the lower Don-Volga to the Caspian depression.
The Sintashta-Petrovka culture, associated with R1a-Z93 and its subclades, was the first Bronze Age advance of the Indo-Europeans west of the Urals, opening the way to the vast plains and deserts of Central Asia to the metal-rich Altai mountains. The Aryans quickly expanded over all Central Asia, from the shores of the Caspian to southern Siberia and the Tian Shan, through trading, seasonal herd migrations, and looting raids.
Horse-drawn war chariots seem to have been invented by Sintashta people around 2100 BCE, and quickly spread to the mining region of Bactria-Margiana (modern border of Turkmenistan, Uzbekistan, Tajikistan and Afghanistan). Copper had been extracted intensively in the Urals, and the Proto-Indo-Iranians from Sintashta-Petrovka were exporting it in huge quantities to the Middle East. They appear to have been attracted by the natural resources of the Zeravshan valley for a Petrovka copper-mining colony was established in Tugai around 1900 BCE, and tin was extracted soon afterwards at Karnab and Mushiston. Tin was an especially valued resource in the late Bronze Age, when weapons were made of copper-tin alloy, stronger than the more primitive arsenical bronze. In the 1700's BCE, the Indo-Iranians expanded to the lower Amu Darya valley and settled in irrigation farming communities (Tazabagyab culture). By 1600 BCE, the old fortified towns of Margiana-Bactria were abandoned, submerged by the northern steppe migrants. The group of Central Asian cultures under Indo-Iranian influence is known as the Andronovo horizon, and lasted until 800 BCE.
The Indo-Iranian migrations progressed further south across the Hindu Kush. By 1700 BCE, horse-riding pastoralists had penetrated into Balochistan (south-west Pakistan). The Indus valley succumbed circa 1500 BCE, and the northern and central parts of the Indian subcontinent were taken over by 500 BCE. Westward migrations led Old Indic Sanskrit speakers riding war chariots to Assyria, where they became the Mitanni rulers from circa 1500 BCE. The Medes, Parthians and Persians, all Iranian speakers from the Andronovo culture, moved into the Iranian plateau from 800 BCE. Those that stayed in Central Asia are remembered by history as the Scythians, while the Yamna descendants who remained in the Pontic-Caspian steppe became known as the Sarmatians to the ancient Greeks and Romans.
The Indo-Iranian migrations have resulted in high R1a frequencies in southern Central Asia, Iran and the Indian subcontinent. The highest frequency of R1a (about 65%) is reached in a cluster around Kyrgyzstan, Tajikistan and northern Afghanistan. In India and Pakistan, R1a ranges from 15 to 50% of the population, depending on the region, ethnic group and caste. R1a is generally stronger is the North-West of the subcontinent, and weakest in the Dravidian-speaking South (Tamil Nadu, Kerala, Karnataka, Andhra Pradesh) and from Bengal eastward. Over 70% of the Brahmins (highest caste in Hindusim) belong to R1a1, due to a founder effect.
Maternal lineages in South Asia are, however, overwhelmingly pre-Indo-European. For instance, India has over 75% of "native" mtDNA M and R lineages and 10% of East Asian lineages. In the residual 15% of haplogroups, approximately half are of Middle Eastern origin. Only about 7 or 8% could be of "Russian" (Pontic-Caspian steppe) origin, mostly in the form of haplogroup U2 and W (although the origin of U2 is still debated). European mtDNA lineages are much more common in Central Asia though, and even in Afghanistan and northern Pakistan. This suggests that the Indo-European invasion of India was conducted mostly by men through war, and the first major settlement of women was in northern Pakistan, western India (Punjab to Gujarat) and northern India (Uttar Pradesh), where haplogroups U2 and W are the most common.
|The Tarim mummies|
In 1934 Swedish archaeologist Folke Bergman discovered some 200 mummies of fair-haired Caucasian people in the Tarim Basin in Northwest China (a region known as Xinjiang, East Turkestan or Uyghurstan). The oldest of these mummies date back to 2000 BCE and all 7 male remains tested by Li et al. (2010), were positive for the R1a1 mutations. The modern inhabitants of the Tarim Basin, the Uyghurs, belong both to this R1b-M73 subclade (about 20%) and to R1a1 (about 30%).
The first theory about the origins of the Tarim mummies is that a group of early horse riders from the Repin culture (3700-3300 BCE) migrated from the Don-Volga region to the Altai mountain, founding the Afanasevo culture (c. 3600-2400 BCE), whence they moved south to the Tarim Basin. Another possibility is that the Tarim mummies descend from the Proto-Indo-Iranian people (see above) who expanded all over Central Asia around 2000 BCE from the Sintashta-Petrovka culture. An offshoot would have crossed the Tian Shan mountains, ending up in the Tarim Basin. This theory has the merit of matching the dating of the Tarim mummies. Either way, most of the mummies tested for mtDNA belonged to the Mongoloid haplogroup C4, and only a few to European or Middle Eastern haplogroups (H, K and R).
There is some controversy regarding the possible link between the Tarim mummies and the Tocharian languages, a Centum branch of the Indo-European family which were spoken in the Tarim Basin from the 3rd to 9th centuries CE. It is easy to assume that the Tarim mummies were Proto-Tocharian speakers due to the corresponding location and the Indo-European connection. However, the Tarim mummies predate the appearance of Tocharian by over two millennia, and Tocharian is a Centum language that cannot be descended from the Satem Proto-Indo-Iranian branch. Other Centum branches being all related to haplogroup R1b, and Tocharian being the only eastern Centum language, it is possible that the Tocharian speakers is instead associated to the Central Asian R1b1b1 (M73) subclade, also found among the modern Uyghurs inhabiting the Tarim basin.
|Turkic speakers and R1a|
The present-day inhabitants of Central Asia, from Xinjiang to Turkey and from the Volga to the Hindu Kush, speak in overwhelming majority Turkic languages. This may be surprising as this corresponds to the region where the Indo-Iranian branch of Indo-European speakers expanded, the Bronze-Age Andronovo culture, and the Iron-Age Scythian territory. So why is it that Indo-European languages only survives in Slavic Russia or in the southern part of Central Asia, in places like Tajikistan, Afghanistan or some parts of Turkmenistan ? Why don't the Uyghurs, Uzbeks, Kazakhs and Kyrgyzs, or the modern Pontic-Caspian steppe people (Crimean Tatars, Nogais, Bashkirs and Chuvashs) speak Indo-European vernaculars ? Genetically these people do carry Indo-European R1a, and to a lesser extent also R1b, lineages. The explanation is that Turkic languages replaced the Iranian tongues of Central Asia between the 4th and 11th century CE.
Proto-Turkic originated in Mongolia and southern Siberia with such nomadic tribes as the Xiongnu. It belongs to the Altaic linguistic family, like Mongolian and Manchu (some also include Korean and Japanese, although they share very little vocabulary in common). It is unknown when Proto-Turkic first emerged, but its spread started with the Hunnic migrations westward through the Eurasian steppe and all the way to Europe, only stopped by the boundaries of the Roman Empire.
The Huns were the descendants of the Xiongnu. Ancient DNA tests have revealed that the Xiongnu were already a hybrid Eurasian people 2,000 years ago, with mixed European and North-East Asian Y-DNA and mtDNA. Modern inhabitants of the Xiongnu homeland have approximately 90% of Mongolian lineages against 10% of European ones. The oldest identified presence of European mtDNA around Mongolia and Lake Baikal dates back to over 6,000 years ago.
It appears that Turkic quickly replaced the Scythian and other Iranian dialects all over Central Asia. Other migratory waves brought more Turkic speakers to Eastern and Central Europe, like the Khazars, the Avars, the Bulgars and the Turks (=> see 5000 years of migrations from the Eurasian steppes to Europe). All of them were in fact Central Asian nomads who had adopted Turkic language, but had little if any Mongolian blood. Turkic invasions therefore contributed more to the diffusion of Indo-European lineages (especially R1a1) than East Asian ones.
Turkic languages have not survived in Europe outside the Pontic-Caspian steppe. Bulgarian language, despite being named after a Turkic tribe, is actually a Slavic tongue with a mild Turkic influence. Hungarian, sometimes mistaken for the heir of Hunnic because of its name, is in reality an Uralic language (Magyar). the The dozens of Turkic languages spoken in the world today have a high degree of mutual intelligibility due to their fairly recent common origin and the nomadic nature of its speakers (until recently). Its two main branches Oghuz and Oghur could be seen as two languages about as distant as Spanish and Italian, and languages within each branch like regional dialects of Spanish and Italian.
The Greek branch
Little is known about the arrival of Proto-Greek speakers from the steppes. The Mycenaean culture commenced circa 1650 BCE and is clearly an imported steppe culture. The close relationship between Mycenaean and Proto-Indo-Iranian languages suggest that they split fairly late, some time between 2500 and 2000 BCE. Archeologically, Mycenaean chariots, spearheads, daggers and other bronze objects show striking similarities with the Seima-Turbino culture (c. 1900-1600 BCE) of the northern Russian forest-steppes, known for the great mobility of its nomadic warriors (Seima-Turbino sites were found as far away as Mongolia). It is therefore likely that the Mycenaean descended from Russia to Greece between 1900 and 1650 BCE, where they intermingled with the locals to create a new unique Greek culture.
R1 populations spread genes for light skin, blond hair and red hair
There is now strong evidence that both R1a and R1b people contributed to the diffusion of the A111T mutation of the SLC24A5, which explains apporximately 35% of skin tone difference between Europeans and Africans, and most variations within South Asia. The distribution pattern of the A111T allele (rs1426654) of matches almost perfectly the spread of Indo-European R1a and R1b lineages around Europe, the Middle East, Central Asia and South Asia. The mutation was probably passed on in the Early neolithic to other Near Eastern populations, which explains why Neolithic farmers in Europe already carried the A111T allele (e.g. Keller 2012 p.4, Lazaridis 2014 suppl. 7), although at lower frequency than modern Europeans and southern Central Asians.
The light skin allele is also found at a range of 15 to 30% in in various ethnic groups in northern sub-Saharan Africa, mostly in the Sahel and savannah zones inhabited by tribes of R1b-V88 cattle herders like the Fulani and the Hausa. This would presuppose that the A111T allele was already present among all R1b people before the Pre-Pottery Neolithic split between V88 and P297. R1a populations have an equally high incidence of this allele as R1b populations. On the other hand, the A111T mutation was absent from the 24,000-year-old R* sample from Siberia, and is absent from most modern R2 populations in Southeast India and Southeast Asia. Consequently, it can be safely assumed that the mutation arose among the R1* lineage during the late Upper Paleolithic, probably some time between 20,000 and 13,000 years ago.
Fair hair was another physical trait associated with the Indo-Europeans. In contrast, the genes for blue eyes were already present among Mesolithic Europeans belonging to Y-haplogroup I. The genes for blond hair are more strongly correlated with the distribution of haplogroup R1a, but those for red hair have not been found in Europe before the Bronze Age, and appear to have been spread primarily by R1b people (=> see The origins of red hair).
The maternal lineages (mtDNA) corresponding to haplogroup R1a
Comparing the regions where haplogroup R1a is found today with the modern mtDNA frequencies, it transpires that the maternal lineages that correlate the most with Y-haplogroup R1a are mt-haplogroups C4a, H1b, H1c, H2a1, H6, H7, H11, T1a1a1, U2e, U4, U5a1a and W, as well as some subclades of I, J, K, T2 and V (see below). Ancient mtDNA from Northeast Europe also corroborates this.
The oldest mtDNA sample from the presumed homeland of R1a men is a 30,000-year-old eary modern human from the Kostenki 14 site on the Don River in southern Russia, who belonged to haplogroup U2. This haplogroup is found at low frequencies (1-2%) throughout Europe today and is most common in Russia. Only two U2 subclades are found in Europe: U2d (rare) and U2e (the most common). All other U2 subclades are typically found in South Asia, where they can exceed 20% of the population in some regions, particularly in Pakistan and northern India where R1a and R2 are the most common. U2 is found as far as Southeast Asia, where Y-haplogroup R1a and R1b are absent, but R2 is present. All these elements combined support the hypothesis that U2 was one of the original lineages of basal Y-haplogroup R, and that U2 was spread alongside R1a, R1b and R2. Note that the 24,000-year-old Mal'ta boy belonged to mt-haplogroup U and Y-haplogroup R*.
Haplogroup C is also found among Northeast Europeans, and appears to have been present in the region at least since the Mesolithic period. MtDNA C is a lineage typically found in North Asia, East Asia and among Native Americans. The subclades of C found in Europe are C1, C4a and C5, all subclades found in the Altai region, southern Siberia and particularly western Siberia. They could have been one of the maternal lineages of North Asian mammoth hunters during the late Paleolithic, alongside haplogroup U2. C5 was identified in Mesolithic Karelia (north-western Russia), while C4a2 was among the lineages of the Dnieper-Donets culture in Neolithic Ukraine. C4a3 and C4a6 samples dating from the Bronze Age (Catacomb culture) were also found in the Odessa region of Ukraine. Both C4a and C5 are common among the Turkmens, Uzbeks and Tajiks today, populations with substantial levels of R1a and R1b. C1 has been found at the Pre-Pottery Neolithic site of Tell Halula in northern Syria, in the region where cattle were first domesticated. As such, it is possible that C1 was brought by Y-haplogroup R1b, which could have originated in northern Mesopotamia before moving to the Pontic Steppe (see R1b history). C1 has also been found among modern Basques and Catalonians, two populations that completely lacks East Asian autosomal admixture and Y-haplogroup R1a, but have over 80% of R1b. C4a and C5 are more typical of modern Balto-Slavic populations, although their combined frequency is usually under 1% of the population. Li et al (2010) tested 20 mtDNA and 7 Y-DNA samples of Early Bronze Age (c. 2000 BCE) remains from the Tarim basin in Xinjiang , north-western China. Of these so-called Tarim mummies, 14 out of 20 mtDNA samples belonged to haplogroup C4, while the six others belonged to M*, R* (3 samples), H and K. All Y-DNA lineages turned out to be R1a1a.
Haplogroups U4 and W are the two lineages most typical of Balto-Slavic countries. Along with U5 and V, these were probably native Paleolithic European lineages linked to Y-DNA haplogroup I that were assimilated by the R1a mammoth hunters from North Asia.
By the Mesolithic period, the population of Ukraine, Belarus and European Russia would have become a hybrid of Siberian R1a male lineages and European H, I, U4, U5a1, T, V and W female lineages, with only a minority of Siberian maternal haplogroups (C4a, C5, U2). The presence of haplogroups C1, C4, C5, H (including H17), T, U2e, U4, U5a and U5b have all been confirmed by DNA tests from Mesolithic European Russia. V has been found in Mesolithic Iberia and Sweden. Only I and W have not yet been identified in ancient samples prior to the Neolithic. All these haplogroups are now found in regions settled by the Indo-Europeans during the Bronze Age and where R1a is present today, including the Caucasus, the Altai, Central Asia, South Asia and the Middle East.
Central Asian populations with elevated percentages of R1a are very useful to decant R1a's maternal lineages from the mass of other European haplogroups. For example mt-haplogroup J2b1a, a typically European subclade of J, has been found among the Kalash of northern Pakistan, alongside H2a1, U2e and U4. J2b1a hasn't been found elsewhere east of Iran. The Kalash have 20% of R1a, but no R1b. Similarly, the subclades K1b1b and K1c2 were found in Tajikistan, a country that has 30% of R1a and only 3% of R1b.
K2b was found by Keyser et al. (2009) in Bronze Age samples related to the Andronovo culture from the Krasnoyarsk area in southern Siberia. The male samples tested from the same site belonged R1a. Nowadays K2b is found mostly in central and north-eastern Europe (R1a countries) and can therefore safely be linked to the diffusion of the R1a branch of the Indo-Europeans.
H1b, H1c, H2a1 and H11, some of the most common subclades of H in Eastern Europe and the North Caucasus, constantly show up around Siberia and Central Asia too.
Haplogroup T1a1a1 is fairly common in northern, central and eastern Europe, primarily among R1a populations. It is also found in Iran and Pakistan, among the Pathans, Brahui and Hunza Burusho, all ethnic groups with high percentages of R1a, but no R1b. Contrarily to T2, T1a1a1 has never been found in Europe before the Bronze Age, and make its first appearance with the Corded Ware culture. Additionally T1a1a1 emerged less than 7,000 years ago, barely one millennium before the beginning of the Bronze Age in the Pontic Steppe.
Haplogroup T2 is more complex because of its very large number of subclades. Some can nevertheless be connected with confidence with the Bronze Age diffusion of R1a. T2b2 and T2b4 were found in remains from the Corded Ware culture, and haven't been found in Europe before the Bronze Age. Nowadays, both subclades are found in most of Europe, but T2b2 is also found at low frequencies in Iran, Turkmenistan and India, while T2b4 has been found in Azerbaijan, Mesopotamia, Uzbekistan, Kazakhstan and Nepal. T2b16, a subclade typical of north-eastern Europe, has been found as far east as the Volga-Ural region and Kazakhstan. It might have been part of the Abashevo expansion towards the Urals. T2a1b1 was found in Bronze Age samples related to the Andronovo culture in southern Siberia, where Y-DNA samples all belonged to R1a. Furthermore, its modern distribution also matches R1a territories, ranging from Scandinavia to Central Asia, cropping up sporadically in Iran and around the Near East.
Haplogroup W is generally found in regions with a high percentages of R1a and is particularly common in Balto-Slavic countries, in Bactria, northern Pakistan and northwest India. It has not been found in Paleolithic or Mesolithic European samples. Although some W1 samples did turn up in Late Neolithic Europe (one W1 in Spain and two W1c in Germany), they may ended up there through westward drift from the Pontic Steppe after centuries of intermarriages between neighbouring populations. Other W subclades suddenly popped up in Europe during the Bronze Age, such as in the Corded Ware (W5a, W6a'b) and Unetice (W3a1) cultures. In India, haplogroup W is considerably more common among the upper castes and among Indo-European speakers according to Metspalu et al. (2004).
Subclades of haplogroup V are the most difficult to set apart at the moment, principally because a lot of V subclades can only be identified by testing the coding region of the mitochondrial DNA, and most studies of non-European mtDNA to date only tested the hypervariable region (HVR). The only subclade that could be singled out at the moment as corresponding to R1a is V7a, which is specific to Slavic countries and has been found in Azerbaijan too.
See also: Identifying the original Indo-European mtDNA from isolated settlements
In 2003, an Oxford University scientist traced the Y-chromosome signature of Somerled of Argyll (1100-1164), a military and political leader of the Scottish Isles of Norse-Gaelic descent. Somerland drove the Vikings out of Scotland and became King of Mann and the Isles. He was the founder of Clan Somhairle, the father of the founder of Clan MacDougall, and the paternal grandfather of the founder of Clan Donald (which includes the MacDonalds and MacAlisters). The researcher reported that the tested members of these clans with a confirmed paper trail all belonged to the Norwegian variety of R1a-L448, and more specifically to the subclade L176.1, which to date has been found almost exclusively among the descendants of Somerled. In 2005, geneticist Bryan Sykes asked for DNA samples from clan chiefs (Lord Godfrey Macdonald, Sir Ian Macdonald of Sleat, Ranald MacDonald of Clan Ranald, William McAlester of Loup and Ranald MacDonnell of Glengary) to complete the project, and all matched the presumed Somerled haplotype. Not all Macdonalds, MacAlisters and MacDougalls are descended from Somerled though. The majority (about 70%) are members of the Celtic haplogroup R1b. Check the Donald Clan Genetic Genealogy Project for more information.
Based on descendant testing, it appears most likely that the sultans of the Ottoman dynasty belonged to haplogroup R1a-Z93. This has not been officially confirmed yet. All sultans of the Ottoman Empire (1299-1922) descend in patrilineal line from Osman I, making it one of the longest reigning Y-chromosomal lineage in history.
The Drake DNA Surname Project managed to identify the haplogroup of Sir Francis Drake, the famous English navigator and privateer from the Elizabethan era. Two of his known descendants were tested by two different companies and both lineages had practically identical STR values, which confirmed their recent common ancestry. Other Drakes also turned up with the same haplotype. All of them belong to the typically north-Western European R1a-L664 (DYS388=10).
An analysis of the Hume DNA Project has provided conclusive evidence that the Scottish philosopher, historian and economist David Hume (1711-1776) belonged to haplogroup R1a-CTS4179. This subclade is the most common Scottish variety of R1a. It is belived to have come from Norway with the Vikings. David Hume was one of the fathers of the Scottish Enlightenment and one of the leading Empiricist philosophers.
The American actor, producer, writer, and director Tom Hanks, best known for his roles in the films Philadelphia, Forrest Gump, Saving Private Ryan, Catch Me If You Can, The Da Vinci Code, was found to belong to haplogroup R1a through the Hanks DNA Surname Project as a descendant of William Hanks of Richmond, Virginia.
Ask your questions and discuss about haplogroups on the Forum