Haplogroup R1a probably branched off from R1* around the time of the Last Glacial Maximum (19,000 to 26,000 years before present). Little is know for certain about its place of origin. Some think it might have originated in the Balkans or around Pakistan and Northwest India, due to the greater genetic diversity found in these regions. The diversity can be explained by other factors though. The Balkans have been subject to 5000 years of migrations from the Eurasian Steppes, each bringing new varieties of R1a. South Asia has had a much bigger population than any other parts of the world (occasionally equalled by China) for at least 10,000 years, and larger population bring about more genetic diversity. The most likely place of origin of R1a is Central Asia or southern Russia/Siberia.
R1a is thought to have been the dominant haplogroup among the northern and eastern Proto-Indo-European language speakers, that evolved into the Indo-Iranian, Thracian, Baltic and Slavic branches. The Proto-Indo-Europeans originated in the Yamna culture (3300-2500 BCE). Their dramatic expansion was possible thanks to an early adoption of bronze weapons and the domestication of the horse in the Eurasian steppes (circa 4000-3500 BCE). The southern Steppe culture is believed to have carried predominantly R1b (M269 and M73) lineages, while the northern forest-steppe culture would have been essentially R1a-dominant. The first expansion of the forest-steppe people occured with the Corded Ware Culture (see Germanic branch below). The migration of the R1b people to central and Western Europe left a vacuum for R1a people in the southern steppe around the time of the Catacomb culture (2800-2200 BCE). The forest-steppe origin of this culture is obvious from the introduction of corded pottery and the abundant use of polished battle axes, the two most prominent features of the Corded Ware culture. This is also probably when the satemisation process of the Indo-European languages began since the Balto-Slavic and Indo-Iranian language groups belong to the same Satem isogloss and both appear to have evolved from the the Catacomb culture.
Ancient DNA testing has confirmed the presence of haplogroup R1a1a in samples from the Corded Ware culture in Germany (2600 BCE), from Tocharian mummies (2000 BCE) in Northwest China, from Kurgan burials (circa 1600 BCE) from the Andronovo culture in southern Russia and southern Siberia, as well as from a variety of Iron-age sites from Russia, Siberia, Mongolia and Central Asia.
Distribution of haplogroup R1a in Europe
Nowadays, high frequencies of R1a are found in Poland (57.5% of the population), Ukraine (40 to 65%), European Russia (45 to 65%), Belarus (51%), Slovakia (42%), Latvia (40%), Lithuania (38%), the Czech Republic (34%), Hungary (32%), Norway (27%), Austria (26%), Croatia (24%), north-east Germany (24%) Sweden (19%), and Romania (18%).
Subclades & Haplotypes
99% R1a people belong to subclades of R1a1a1 (R1a-M417), which is divided in the following subclades:
- R1a-L664 is essentially Northwest European, found chiefly in West Germany, the Low Countries and the British Isles.
- R1a-Z645 makes up the bunch of R1a individuals from Central Europe to South Asia.
- R1a-Z283 is the main Central & East European branch.
- R1a-Z284 is a Scandinavian subclade with an epicentre in Norway. It is found also in places colonised by the Norwegian Vikings, like some parts of Scotland, England and Ireland. Several subclades were identified, including L448, L176.1, Z287/Z288, Z66 and Z281 about which little is known at the moment.
- R1a-M458, primarily a Balto-Slavic subclade, with maximum frequencies in Poland, Lithuania, the Czech Republic, Slovakia, but is also fairly common in south-east Ukraine and Northwest Russia.
- its subclade R1a-L260 is clearly West Slavic, with a peak of frequency in Poland, the Czech Republic and Slovakia, and radiating at lower frequencies into East Germany, East Austria, Slovenia and Hungary.
- R1a-Z280 is also an Balto-Slavic marker, found all over central and Eastern Europe (except in the Balkans), with a western limit running from East to south-west Germany and to Northeast Italy. It can be divided in many clusters: East Slavic, Baltic, Pomeranian, Polish, Carpathian, East-Alpine, Czechoslovak, and so on.
- its subclade R1a-L365 is a Pomeranian cluster found also in southern Poland.
- R1a-Z93 is the main Asian branch of R1a. It is found in Central Asia, South Asia and Southwest Asia (including among Ashkenazi Jews). R1a-Z93 is the marker of historical peoples such as the Indo-Aryans, Persians, Medes, Mitanni, or Tatars, and pervaded the genetic pool of the Arabs and Jews.
- its subclade R1a-M434 makes up a small percentage of the population of Pakistan. Traces have also been found in Oman.
A lot of Western and Northern European R1a that is negative for the marker Z284 falls under the root R1a1a1* (M417), or even in the older R1a1a (M17) and R1a1 (SRY10831.2). The former are descended from the oldest known expansion of R1a out of the Forest-Steppe, the Corded Ware Culture (see below), which predates all the above subclades. At present no subclade has been identified by a common SNP. However, Klyosov et al. (2009) found that a substantial percentage of R1a in Northwest Europe, particularly in Norway, England, Ireland and Iceland, had a repeat value of 10 (instead of 12) at the STR marker DYS388. Among them, some individuals were identified as carrying the mutation L664. The origin of the older subclades (M17 and SRY10831.2) is still unclear (perhaps Mesolithic hunter-gatherers roaming around Europe).
History of R1a
The Germanic branch
The first major expansion of R1a took place with the westward propagation of the Corded Ware (or Battle Axe) culture (2800-1800 BCE) from the northern forest-steppe in the Yamna homeland. This was the first wave of R1a into Europe, the one that brought the Z283 subclade to Germany and the Netherlands, and Z284 to Scandinavia. The Corded Ware R1a people would have mixed with the pre-Germanic I1 and I2 aborigines, which resulted in the first Indo-European culture in Germany and Scandinavia, although that culture could not be considered Proto-Germanic - it was simply Proto-Indo-European at that stage, or perhaps or Proto-Balto-Slavic.
Germanic languages probably did not appear before the Nordic Bronze Age (1800-500 BCE). Proto-Germanic language probably developed as a blend of two branches of Indo-European languages, namely the Proto-Balto-Slavic language of the Corded-Ware culture (R1a-Z283) and the later arrival of Proto-Italo-Celto-Germanic people from the Unetice culture (R1b-L11). This is supported by the fact that Germanic people are a R1a-R1b hybrid, that these two haplogroups came via separate routes at different times, and that Proto-Germanic language is closest to Proto-Italo-Celtic, but also shares similarities with Proto-Slavic.
The R1b branch of the Indo-Europeans is thought to have originated in the southern Yamna culture (northern shores of the Black Sea). It was the first one to move from the steppes to Europe, invading the Danube delta around 4200 BCE, then making its way around the Balkans and the Hungarian plain in the 4th millennium BCE. It is likely that a minority of R1a people accompanied this R1b migration. Those R1a men would have belonged to the L664 subclade, the first to split from the Yamna core. These early steppe invaders were not a homogeneous group, but a cluster of tribes. It is possible that the R1a-L664 people were one or several separate tribes of their own, or that they mixed with some R1b lineages, notably R1b-U106, which would become the main Germanic lineage many centuries later. The R1b-R1a contingent moved up the Danube to the Panonian plain around 2800 BCE, brought to an end the local Bell Beaker (circa 2200 BCE) and Corded Ware (c. 2400 BCE) cultures in Central Europe, and set up the Unetice culture (2300-1600 BCE) around Bohemia and eastern Germany. Unetice can be seen as the source of future Germanic, Celtic and Italic cultures, and is associated with the L11 subclade of R1b.
The late Unetice culture expanded to Scandinavia, founding the Nordic Bronze Age. R1a-L664 and R1b (L11 and U106) presumably reached Scandinavia at this time. People from the Nordic Bronze Age probably spoke a Proto-Germanic language, which for over a thousand years acquired vocabulary from the indigenous Corded Ware language, itself a mixture of Proto-Balto-Slavic and non-IE pre-Germanic. The first genuine Germanic tongue has been estimated by linguists to have come into existence around (or after) 500 BCE, just as the Nordic Bronze Age came to an end, giving way to the Pre-Roman Iron Age. The uniqueness of some of the Germanic vocabulary points at borrowing from native pre-Indo-European languages (Germanic substrate theory). The Celtic language itself is known to have borrowed from Afro-Asiatic languages spoken by Near-Eastern immigrants to Central Europe. The fact that present-day Scandinavia is composed of roughly 40% of I1, 20% of R1a and 40% of R1b reinforces the idea that the Germanic ethnicity and language had acquired a tri-hybrid character by the Iron Age.
The Baltic branch
The Baltic branch is thought to have evolved from the Fatyanovo culture (3200-2300 BCE), the northeastern extension of the Corded Ware culture. Early Bronze Age R1a nomads from the northern steppes and forest-steppes would have mixed with the indigenous Uralic-speaking inhabitants (N1c1 lineages) of the region. This is supported by a strong presence of both R1a and N1c1 haplogroups from southern Finland to Lithuania and the adjacent part of Russia.
The Slavic branch
The origins of the Slavs go back to circa 3500 BCE with the northern Yamna culture. The M412 and Z280 lineages spread around Poland, Belarus, Ukraine and western Russia, and would form the core of the Proto-Slavic culture. The high prevalence of R1a in Balto-Slavic countries nowadays is not only due to the Corded Ware expansion, but also to a long succession of later migrations from Russia, the last of which took place from the 5th to the 1th century CE. The Slavic branch differentiated itself when the Corded Ware culture absorbed the Cucuteni-Tripolye culture (5200-2600 BCE) of western Ukraine and north-eastern Romania, which appears to have been composed primarily of I2a1b (M423) lineages descended directly from Paleolithic Europeans, with a small admixture of Near-Eastern immigrants (notably E1b1b, G2a, J and T). Thus emerged the hybrid Globular Amphora culture (3400-2800 BCE) in what is now Ukraine, Belarus and Poland. It is surely during this period that I2a2, E-V13 and T spread (along with R1a) around Poland, Belarus and western Russia, explaining why eastern and northern Slavs (and Lithuanians) have between 10 and 20% of I2a1b lineages and about 10% of Middle Eastern lineages (18% for Ukrainians). After just a few centuries, this hybridised culture faded away into the dominant Corded Ware (2800-1800 BCE) and Catacomb (2800-1800 BCE) cultures.
The Corded Ware period was followed in the steppes by the Srubna culture (1800-1200 BCE), and around Poland by the Trzciniec culture (1700-1200 BCE). The last important Slavic migration is thought to have happened in the 6th century CE, from Ukraine to Poland, the Czech Republic and Slovakia, filling the vacuum left by eastern Germanic tribes who invaded the Roman Empire.
Historically, no other part of Europe was invaded a higher number of times by steppe peoples than the Balkans. Chronologically, the first R1a invaders came with the westward expansion of the Yamna culture (from 4200 BCE), a succession of steppe migrations that lasted about 2000 years. Then came the Thracians (1500 BCE), followed by the Illyrians (around 1200 BCE), the Huns and the Alans (400 CE), the Avars, the Bulgars and the Serbs (all around 600 CE), and the Magyars (900 CE), among others. These peoples originated from different parts of the Eurasian steppes, anywhere between Eastern Europe and Central Asia, which is why such high STR diversity is found within Balkanic R1a nowadays. It is not yet possible to determine the ethnic origin for each variety of R1a, apart from the fact that about any R1a is associated with tribes from Eurasian steppe at one point in history.
Migration map of haplogroup R1a from the Neolithic to the late Bronze Age (c. 1000 BCE)
Click to enlarge.
The Indo-Iranian branch
Proto-Indo-Iranian speakers, the people who later called themselves 'Aryans' in the Rig Veda and the Avesta, originated in the Sintashta-Petrovka culture (2100-1750 BCE), in the Tobol and Ishim valleys, east of the Ural Mountains. It was founded by pastoralist nomads from the Abashevo culture (2500-1900 BCE), ranging from the upper Don-Volga to the Ural Mountains, and the Poltavka culture (2700-2100 BCE), extending from the lower Don-Volga to the Caspian depression.
The Sintashta-Petrovka culture, associated with R1a-Z93 and its subclades, was the first Bronze Age advance of the Indo-Europeans west of the Urals, opening the way to the vast plains and deserts of Central Asia to the metal-rich Altai mountains. The Aryans quickly expanded over all Central Asia, from the shores of the Caspian to southern Siberia and the Tian Shan, through trading, seasonal herd migrations, and looting raids.
Horse-drawn war chariots seem to have been invented by Sintashta people around 2100 BCE, and quickly spread to the mining region of Bactria-Margiana (modern border of Turkmenistan, Uzbekistan, Tajikistan and Afghanistan). Copper had been extracted intensively in the Urals, and the Proto-Indo-Iranians from Sintashta-Petrovka were exporting it in huge quantities to the Middle East. They appear to have been attracted by the natural resources of the Zeravshan valley for a Petrovka copper-mining colony was established in Tugai around 1900 BCE, and tin was extracted soon afterwards at Karnab and Mushiston. Tin was an especially valued resource in the late Bronze Age, when weapons were made of copper-tin alloy, stronger than the more primitive arsenical bronze. In the 1700's BCE, the Indo-Iranians expanded to the lower Amu Darya valley and settled in irrigation farming communities (Tazabagyab culture). By 1600 BCE, the old fortified towns of Margiana-Bactria were abandoned, submerged by the northern steppe migrants. The group of Central Asian cultures under Indo-Iranian influence is known as the Andronovo horizon, and lasted until 800 BCE.
The Indo-Iranian migrations progressed further south across the Hindu Kush. By 1700 BCE, horse-riding pastoralists had penetrated into Balochistan (south-west Pakistan). The Indus valley succumbed circa 1500 BCE, and the northern and central parts of the Indian subcontinent were taken over by 500 BCE. Westward migrations led Old Indic Sanskrit speakers riding war chariots to Assyria, where they became the Mitanni rulers from circa 1500 BCE. The Medes, Parthians and Persians, all Iranian speakers from the Andronovo culture, moved into the Iranian plateau from 800 BCE. Those that stayed in Central Asia are remembered by history as the Scythians, while the Yamna descendants who remained in the Pontic-Caspian steppe became known as the Sarmatians to the ancient Greeks and Romans.
The Indo-Iranian migrations have resulted in high R1a frequencies in southern Central Asia, Iran and the Indian subcontinent. The highest frequency of R1a (about 65%) is reached in a cluster around Kyrgyzstan, Tajikistan and northern Afghanistan. In India and Pakistan, R1a ranges from 15 to 50% of the population, depending on the region, ethnic group and caste. R1a is generally stronger is the North-West of the subcontinent, and weakest in the Dravidian-speaking South (Tamil Nadu, Kerala, Karnataka, Andhra Pradesh) and from Bengal eastward. Over 70% of the Brahmins (highest caste in Hindusim) belong to R1a1, due to a founder effect.
Maternal lineages in South Asia are, however, overwhelmingly pre-Indo-European. For instance, India has over 75% of "native" mtDNA M and R lineages and 10% of East Asian lineages. In the residual 15% of haplogroups, approximately half are of Middle Eastern origin. Only about 7 or 8% could be of "Russian" (Pontic-Caspian steppe) origin, mostly in the form of haplogroup U2 and W (although the origin of U2 is still debated). European mtDNA lineages are much more common in Central Asia though, and even in Afghanistan and northern Pakistan. This suggests that the Indo-European invasion of India was conducted mostly by men through war, and the first major settlement of women was in northern Pakistan, western India (Punjab to Gujarat) and northern India (Uttar Pradesh), where haplogroups U2 and W are the most common.
|The Tarim mummies|
In 1934 Swedish archaeologist Folke Bergman discovered some 200 mummies of fair-haired Caucasian people in the Tarim Basin in Northwest China (a region known as Xinjiang, East Turkestan or Uyghurstan). The oldest of these mummies date back to 2000 BCE and all 7 male remains tested by Li et al. (2010), were positive for the R1a1 mutations. The modern inhabitants of the Tarim Basin, the Uyghurs, belong both to this R1b-M73 subclade (about 20%) and to R1a1 (about 30%).
The first theory about the origins of the Tarim mummies is that a group of early horse riders from the Repin culture (3700-3300 BCE) migrated from the Don-Volga region to the Altai mountain, founding the Afanasevo culture (c. 3600-2400 BCE), whence they moved south to the Tarim Basin. Another possibility is that the Tarim mummies descend from the Proto-Indo-Iranian people (see above) who expanded all over Central Asia around 2000 BCE from the Sintashta-Petrovka culture. An offshoot would have crossed the Tian Shan mountains, ending up in the Tarim Basin. This theory has the merit of matching the dating of the Tarim mummies. Either way, most of the mummies tested for mtDNA belonged to the Mongoloid haplogroup C4, and only a few to European or Middle Eastern haplogroups (H, K and R).
There is some controversy regarding the possible link between the Tarim mummies and the Tocharian languages, a Centum branch of the Indo-European family which were spoken in the Tarim Basin from the 3rd to 9th centuries CE. It is easy to assume that the Tarim mummies were Proto-Tocharian speakers due to the corresponding location and the Indo-European connection. However, the Tarim mummies predate the appearance of Tocharian by over two millennia, and Tocharian is a Centum language that cannot be descended from the Satem Proto-Indo-Iranian branch. Other Centum branches being all related to haplogroup R1b, and Tocharian being the only eastern Centum language, it is possible that the Tocharian speakers is instead associated to the Central Asian R1b1b1 (M73) subclade, also found among the modern Uyghurs inhabiting the Tarim basin.
|Turkic speakers and R1a|
The present-day inhabitants of Central Asia, from Xinjiang to Turkey and from the Volga to the Hindu Kush, speak in overwhelming majority Turkic languages. This may be surprising as this corresponds to the region where the Indo-Iranian branch of Indo-European speakers expanded, the Bronze-Age Andronovo culture, and the Iron-Age Scythian territory. So why is it that Indo-European languages only survives in Slavic Russia or in the southern part of Central Asia, in places like Tajikistan, Afghanistan or some parts of Turkmenistan ? Why don't the Uyghurs, Uzbeks, Kazakhs and Kyrgyzs, or the modern Pontic-Caspian steppe people (Crimean Tatars, Nogais, Bashkirs and Chuvashs) speak Indo-European vernaculars ? Genetically these people do carry Indo-European R1a, and to a lesser extent also R1b, lineages. The explanation is that Turkic languages replaced the Iranian tongues of Central Asia between the 4th and 11th century CE.
Proto-Turkic originated in Mongolia and southern Siberia with such nomadic tribes as the Xiongnu. It belongs to the Altaic linguistic family, like Mongolian and Manchu (some also include Korean and Japanese, although they share very little vocabulary in common). It is unknown when Proto-Turkic first emerged, but its spread started with the Hunnic migrations westward through the Eurasian steppe and all the way to Europe, only stopped by the boundaries of the Roman Empire.
The Huns were the descendants of the Xiongnu. Ancient DNA tests have revealed that the Xiongnu were already a hybrid Eurasian people 2,000 years ago, with mixed European and North-East Asian Y-DNA and mtDNA. Modern inhabitants of the Xiongnu homeland have approximately 90% of Mongolian lineages against 10% of European ones. The oldest identified presence of European mtDNA around Mongolia and Lake Baikal dates back to over 6,000 years ago.
It appears that Turkic quickly replaced the Scythian and other Iranian dialects all over Central Asia. Other migratory waves brought more Turkic speakers to Eastern and Central Europe, like the Khazars, the Avars, the Bulgars and the Turks (=> see 5000 years of migrations from the Eurasian steppes to Europe). All of them were in fact Central Asian nomads who had adopted Turkic language, but had little if any Mongolian blood. Turkic invasions therefore contributed more to the diffusion of Indo-European lineages (especially R1a1) than East Asian ones.
Turkic languages have not survived in Europe outside the Pontic-Caspian steppe. Bulgarian language, despite being named after a Turkic tribe, is actually a Slavic tongue with a mild Turkic influence. Hungarian, sometimes mistaken for the heir of Hunnic because of its name, is in reality an Uralic language (Magyar). the The dozens of Turkic languages spoken in the world today have a high degree of mutual intelligibility due to their fairly recent common origin and the nomadic nature of its speakers (until recently). Its two main branches Oghuz and Oghur could be seen as two languages about as distant as Spanish and Italian, and languages within each branch like regional dialects of Spanish and Italian.
The Greek branch
Little is known about the arrival of Proto-Greek speakers from the steppes. The Mycenaean culture commenced circa 1650 BCE and is clearly an imported steppe culture. The close relationship between Mycenaean and Proto-Indo-Iranian languages suggest that they split fairly late, some time between 2500 and 2000 BCE. Archeologically, Mycenaean chariots, spearheads, daggers and other bronze objects show striking similarities with the Seima-Turbino culture (c. 1900-1600 BCE) of the northern Russian forest-steppes, known for the great mobility of its nomadic warriors (Seima-Turbino sites were found as far away as Mongolia). It is therefore likely that the Mycenaean descended from Russia to Greece between 1900 and 1650 BCE, where they intermingled with the locals to create a new unique Greek culture.
Arms of the MacDonald Clan
In 2003, an Oxford University scientist traced the Y-chromosome signature of Somerled of Argyll (1100-1164), a military and political leader of the Scottish Isles of Norse-Gaelic descent. Somerland drove the Vikings out of Scotland and became King of Mann and the Isles. He was the founder of Clan Somhairle, the father of the founder of Clan MacDougall, and the paternal grandfather of the founder of Clan Donald (which includes the MacDonalds and MacAlisters). The researcher reported that the tested members of these clans with a confirmed paper trail all belonged to the Norwegian variety of R1a-L448, and more specifically to the subclade L176.1, which to date has been found almost exclusively among the descendants of Somerled. In 2005, geneticist Bryan Sykes asked for DNA samples from clan chiefs (Lord Godfrey Macdonald, Sir Ian Macdonald of Sleat, Ranald MacDonald of Clan Ranald, William McAlester of Loup and Ranald MacDonnell of Glengary) to complete the project, and all matched the presumed Somerled haplotype. Not all Macdonalds, MacAlisters and MacDougalls are descended from Somerled though. The majority (about 70%) are members of the Celtic haplogroup R1b. Check the Donald Clan Genetic Genealogy Project for more information.
The Drake DNA Surname Project managed to identify the haplogroup of Sir Francis Drake, the famous English navigator and privateer from the Elizabethan era. Two of his known descendants were tested by two different companies and both lineages had practically identical STR values, which confirmed their recent common ancestry. Other Drakes also turned up with the same haplotype. All of them belong to the typically north-Western European R1a-L664 (DYS388=10).
The American actor, producer, writer, and director Tom Hanks, best known for his roles in the films Philadelphia, Forrest Gump, Saving Private Ryan, Catch Me If You Can, The Da Vinci Code, was found to belong to haplogroup R1a through the Hanks DNA Surname Project as a descendant of William Hanks of Richmond, Virginia.