The speakers of the Proto-Indo-European language are thought to have originated in the Pontic-Caspian Steppe in Ukraine and southern European Russia during the Late Copper Age and Early Bronze Age. Their paternal lineages were identified by ancient DNA tests. The western branch, associated with the development of Centum languages (Celtic, Italic and Germanic, as well as probably Illyrian and Albanian) belonged primarily to haplogroup R1b-L23 and its subclades. They originated in the southern part of the Yamna culture (3500-2500 BCE).
The eastern branch, associated with Satem languages, i.e. the future Baltic, Slavic, Iranian and Indo-Aryan languages (as well as probably also Thracian and Mycenaean Greek), have their roots in the Corded Ware culture (c. 3000-2350 BCE), which emerged from the northern part of the Yamna culture, in the forest-steppe zone. It is principally linked with the diffusion of haplogroup R1a-M417 and its subclades. The western R1b-dominant branch would have carried a minority of R1a and vice versa. Both groups would also have harboured other minority haplogroups, including I2a2a-L701, which was found in Mesolithic and Neolithic Ukraine alongside R1a and R1b, and Q1a, which was found in the Khvalynsk culture, the Chalcolithic predecessor of Yamna between the Don and the Volga.
Haplogroups R1a and R1b are now found across most of Europe and vast swathes of the Middle East, Central Asia and South Asia. As Indo-Europeans advanced from the Pontic-Caspian Steppe by marrying local women as well as Indo-European women, it is difficult to estimate what were the original mtDNA haplogroups of Indo-European tribes back in the steppes, before their great migrations during the Bronze Age. There are two methods to determine the haplogroups of the original Steppe Indo-Europeans.
The first method is to look at ancient DNA, which is pretty straightforward and gives us unambiguous answers. But that method has five drawbacks or limitations:
- 1. Ancient DNA sequencing has only been available for a few years and the number of tested samples is still too limited to get a complete view of the numerous maternal lineages found in the Yamna culture and the various Indo-European cultures descended from it.
- 2. It's not always possible to tell from individual samples whether an individual (especially if female and lacking Y-DNA) belonged to an R1a-dominant or R1b-dominant tribe within the Steppe. Even if the individual is male and his Y-DNA can be identified, it remains inconclusive as some R1b people were found among R1a tribes and the other way round.
- 3. Most samples tested from Bronze Age cultures come from individual elite burials, which are not necessarily representative of the whole population of that culture at that time.
- 4. Some haplogroups that were brought to Europe by the Indo-Europeans during the Bronze Age may have also been found among the Mesolithic or Neolithic populations outside of the Steppe. For example, it is certain that the Bronze Age Indo-Europeans shared haplogroups U2, U4 and U5 with Mesolithic hunter-gathers from the rest of Europe. These lineages are now found in parts of Asia where the Indo-Europeans settled and their frequency correlates with the amount of Indo-European admixture. Ancient DNA from the Sredny Stog culture, which preceded Yamna in Ukraine, showed that haplogroups U4 and U5a1 were already present in the Pontic Steppe in the Neolithic. Additionally, it seems that some clades of H and K were also present among Mesolithic Europeans, including H1, H4, H10, H11, K1a and K1c.
- 5. Ancient DNA is sometimes too damaged to obtain the full mitochondrial genome necessary to determine the deep clades within a haplogroup. Yet deep clades provide the opportunity to distinguish closely related lineages, such as the varieties of haplogroups U4 and U5a found inside and outside the Steppe. Some deep clades may be exclusively of Steppe origins, while others may not be at all.
The second method attempts to remedy these problems by analysing the modern mtDNA lineages found in all places historically settled by the Indo-Europeans, from Europe to South Asia via Central Asia, and to compare which mtDNA haplogroups are found in regions with high densities of R1a or R1b in populations which were separated from each others thousands of years ago. Those mtDNA lineages must then be cross-checked with known Mesolithic and Neolithic lineages from outside the Steppe.
For example, if a lineage wasn't found in Europe before the Bronze Age, then suddenly shows up in cultures descended from Yamna, they can be assumed to be of Indo-European origin. If one of these mt-haplogroups is consistently found today in Baltic and Slavic countries as well as in Central Asia and/or Iran or Pakistan, there is a high chance that this lineage spread with R1a-dominant tribes of the Satem branch. If on the other hand another haplogroup is found across Central and Western Europe, and in parts of Central Asia with higher levels of R1b like Turkmenistan or Uzbekistan, then it is likely to have be propagated by R1b-dominant steppe tribes.
A certain level of overlap exists between maternal lineages of R1a-dominant and R1b-dominant tribes, and as much is expected, as both share the same origins in the Pontic Steppe during the Yamna period. Nevertheless, some lineages are far more correlated with one group than the other.
Isolated Indo-European settlements in Asia, such as the Tarim basin in north-western China, the Altai region in southern Siberia, or Bactria and Margiana in southern Central Asia, provide unique opportunities to find European mtDNA in regions that are otherwise genetically Asian. Since there wasn't any other major European settlement of these regions historically, if we exclude white Russians from the sampled populations, the European mtDNA found in these regions would necessarily correlate with Bronze and Iron Age Indo-Europeans, i.e. haplogroups R1a and R1b. The only interference could come from Middle Eastern mtDNA, especially in Muslim parts of Central Asia and in Xinjiang. Fortunately we have an idea of what mt-haplogroups could have been brought by E1b1b, J1, J2a and T1a populations. They would have brought such as haplogroups HV, N1, J, K, T2 and U3. The Scythians in particular are thought to have hybridised extensively with the descendants of Neolithic southern Central Asians, as ancient Scythian mtDNA contain a lot of these haplogroups (alongside H2a1, U2, U5 and East Asian lineages). But all Scythian Y-DNA so far turned out to be R1a, although they surely included R1b, J1 and J2 lineages too.
The maternal lineages (mtDNA) corresponding to haplogroup R1b
Hundreds of mtDNA samples from Neolithic, Mesolithic and even Paleolithic Europe have been tested to date. These ancient DNA studies revealed that haplogroup R1b lineages were widespread in Mesolithic Southeast and Northeast Europe. Hunter-gatherer tribes where R1b was the dominant (or only) paternal lineage were found in Latvia, around the so-called Iron Gates on the Danube between Serbia and Romania, as well as in Ukraine (both Mesolithic and Neolithic) and in the Middle Volga region of Russia. The maternal lineages they carried were typical of Mesolithic Europeans. Like other tribes belonging to other Y-DNA lineages (C1a2, I*, I2a, I2c, R1a) these R1b tribes belonged primarily to mtDNA haplogroups U5a and U5b, with minorities of U2, U4.
A few R1b individuals from the Iron Gates Mesolithic culture also possessed less typical mtDNA like K1, K1a, H13 and H40. Nevertheless, none of these Mesolithic R1b individuals are ancestral to the R1b of modern Europeans, as none possessed the M269 and L23 mutations. If these lineages were present in the Pontic-Caspian Steppe before R1b-L23 tribes crossed the Caucasus into the Steppe during the Copper Age, they would just have been assimilated by the newcomers, but may have nothing to do with the original R1b-L23 people.
R1b-M269 evolved in isolation for thousands of years, presumably in the area between eastern Anatolia and northwestern Iran, and came back to Eastern Europe with the additional L23 mutation during the Chalcolithic period. Nowadays men carrying the M269 mutation without the L23 mutation are practically all found around Armenia and the southern edge of the Caspian Sea in Azerbaijan and northern Iran. R1b-L23 brought copper metallurgy to the Pontic-Caspian Steppe and became an elite lineage within the nascent Proto-Indo-European society, quickly replacing a big share of other Steppe lineages (I2a, J2b2a, Q1a, R1a, R1b) probably through the practice of polygamy.
Many female lineages in the Yamna culture were in fact different from the previous Mesolithic and Neolithic lineages found in the Steppe, and clearly originated in West Asia and the Caucasus. The new female lineages now included H2b, H6a1, H15, K1b2, I1a, J2b, T1a, T2c1, W3, W6 and X2h, in addition of the previous lineages. As expected, most of these lineages were also found in other Bronze Age cultures associated with Proto-Indo-European speakers, including the Corded Ware and Unetice cultures. Out of 46 Yamna mtDNA samples tested as of June 2017, Mesolithic U2, U4 and U5 only represented 28% of the total. It's possible that some H and K samples also had a Mesolithic or at least Neolithic origin in the Steppe.
The European or Middle Eastern lineages found in Central and North Asia include HV6, HV9, H1a, H1b, H1c, H1f, H1h, H2a1, H2b, H5a, H6a1b, H6b, H7a, H7c, H8a, H8b1, H8c, H11a, H13a, H14a, H15, H20, H27a, H35, H49c, H97, I, J1b1a, J1c (incl. J1c2m, J1c5 and J1c10a), K1a1a, K1a32, K1b1, K1b2, K1c, K2a3, K2a5, K2a6, K2b, N1a, R, T1a1, T2a1b, T2b, T2c1, T2e, T2f, U1, U2e, U3, U4a1, U4a2, U4b1, U4b2, U4b3, U4c1, U4d1, U4d2, U5a1, U5a2, U5b2, U8a1, V7a, V15, W3, W4, W5, W6, X2e1, and X2e2.
If we exclude the haplogroups of Neolithic farmers, those found principally in R1a countries today, and Middle Eastern lineages that rare in Europe, what is left as the potential maternal lineages of Bronze Age R1b men are H2b, H5a, H6, H8c, H15, I, J1b1a, K1c1, K2b, T1a1, T2, U2e, U4, U5, U8, V7 and W. It is likely that H4, H5a, H6, K2b, U2, U4, U5 and U8 came from the indigenous Mesolithic and Neolithic Steppe population. Haplogroups H1b, H5a, T2b and U8b1 were found in the Cucuteni-Trypillian culture and at least some of them probably entered the Steppe gene pool by intermarriage.
What were the original mtDNA lineages of Chalcolithic R1b tribes in Transcaucasia?
The most likely candidates for the original maternal lineages of R1b-M269 and R1b-L23 tribes in Transcaucasia are H8c, H15, I1a1, I2, I3, I4, J1b1a, T1a1, T2b (at least some clades like T2b2 and T2b4), V15 and W. Apart from some T2b and W1 clades, none of these haplogroups have been found in Europe before the Indo-European migrations, nor among Neolithic or Chalcolithic Near Easterners. Nowadays J1b1a and T1a1 display particularly strong correlations with the distribution of Y-haplogroup R1b.
Haplogroups V7 and T2b were both found in remains of the Maykop culture , and H2a1, I, K1c1, R1a and W are all common in the North Caucasus today. Some of these lineages could have been indigenous to the North Caucasus and assimilated by the R1b-L23 migrants during the Maykop period (probably H2a1, K1c1 and R1a, which are rare or absent from West Asia), while others would have been brought by R1b-L23 themselves from the other side of the Caucasus (haplogroups I, T2b, V7 and W, who are common today in Azerbaijan and northwest Iran).
Haplogroup H2b is found in Northwest Europe, Turkey, Siberia, Pakistan and India. H8c is found in central and western Europe, Georgia, Armenia, Central Asia and the Altai. Their low frequency in the Middle East suggest a Caucasian origin, although possibly more in the North Caucasus for H2b, just like H2a1.
Nowadays H2a1 and V7a are found mostly in North Slavic countries, but could have been brought by the R1b-L23 minority that blended with the R1a tribes of the Corded Ware culure (ancestral to the Slavic people). H2a1 was indeed found in the Corded Ware and Unetice culture. The same thing would have happened with some clades of haplogroup W. Nowadays it is especially W5 and W7 that are found in R1b-dominant countries, while other clades are found more in R1a countries or have a mixed distribution.
Haplogroup I has a pan-Caucasian origin and some clades (I5, I6, I7) are also linked to the spread of the Kura-Araxes culture. Those that correlate most with the Indo-European branch of R1b are I1a1, I2, I3 and I4a.
Haplogroups H15 and J1b1a are both found around Armenia and Iran. Only the J1b1a subclade seems to be related to the propagation of R1b-L23. Other J1b subclades are geographically restricted to the Near East, particularly from the Caucasus to the Arabian peninsula. J1b1a has been found in the Corded Ware and Unetice cultures and in the Baltic Bronze Age.
T1a1 is found in Transcaucasia, but also in Kurdistan and Mesopotamia, which indicate a possible connection with the Uruk expansion. T1a1 has been found in many early PIE cultures, including Yamna, Corded Ware, Potapovka, Srubnaya and the Baltic Bronze Age. Yet, contrarily to other T1 subclades, it has never been found in Mesolithic or Neolithic samples from the Near East or Europe, except for one individual from Baalberge in Late Neolithic Germany (2500 BCE) who was contemporary to the arrival of Steppe people to the region (and therefore may simply be a misidentified Steppe migrant).
Haplogroup V15 is found today in Armenia and Northwest Europe and is also a good candidate as a lineage brought by R1b-L23 tribes from Transcaucasia.
The maternal lineages (mtDNA) corresponding to haplogroup R1a
Comparing the regions where haplogroup R1a is found today with the modern mtDNA frequencies, it transpires that the maternal lineages that correlate the most with Y-haplogroup R1a are mt-haplogroups H1b, H1c, H2a1, H6, H7, H11, T1a1a1, U2e, U4, U5a1a and W, as well as some subclades of I, J, K, T2 and V (see below). Ancient mtDNA from Northeast Europe also corroborates this.
The oldest mtDNA sample from the presumed homeland of R1a men is a 30,000-year-old eary modern human from the Kostenki 14 site on the Don River in southern Russia, who belonged to haplogroup U2. This haplogroup is found at low frequencies (1-2%) throughout Europe today and is most common in Russia. Only two U2 subclades are found in Europe: U2d (rare) and U2e (the most common). All other U2 subclades are typically found in South Asia, where they can exceed 20% of the population in some regions, particularly in Pakistan and northern India where paternal haplogroups R1a and R2 are dominant. U2 is found as far as Southeast Asia, where Y-haplogroup R1a and R1b are absent, but R2 is present. All these elements combined support the hypothesis that U2 was one of the original lineages of basal Y-haplogroup R, and that U2 was spread alongside R1a, R1b and R2. Note that the 24,000-year-old Mal'ta boy belonged to mt-haplogroup U and Y-haplogroup R*.
Haplogroups U4 and W are the two lineages most typical of Balto-Slavic countries. Along with U5 and V, these were probably native Paleolithic European lineages linked to Y-DNA haplogroup I that were assimilated by the R1a mammoth hunters from North Asia.
By the Mesolithic period, the population of Ukraine, Belarus and European Russia would have become a hybrid of Siberian R1a and Q1a paternal lineages and European H, I, U4, U5a1, T, V and W maternal lineages, with only a minority of Siberian maternal haplogroups (C4a, C5, U2). The presence of haplogroups C1, C4, C5, H (including H17), T, U2e, U4, U5a and U5b have all been confirmed by DNA tests from Mesolithic European Russia. V has been found in Mesolithic Iberia and Sweden. Only I and W have not yet been identified in ancient samples prior to the Neolithic. All these haplogroups are now found in regions settled by the Indo-Europeans during the Bronze Age and in particular where R1a has a considerable presence today, including the Caucasus, the Altai, Central Asia, South Asia and the Middle East.
Central Asian populations with elevated percentages of R1a are very useful to identify R1a's equivalent maternal lineages from the mass of other European haplogroups. For example mt-haplogroup J2b1a, a typically European subclade of J, has been found among the Kalash of northern Pakistan, alongside H2a1, U2e and U4. J2b1a hasn't been found elsewhere east of Iran. The Kalash have 20% of R1a, but no R1b. Similarly, the subclades K1b1b and K1c2 were found in Tajikistan, a country that has 30% of R1a and only 3% of R1b.
K2b was found by Keyser et al. (2009) in Bronze Age samples related to the Andronovo culture from the Krasnoyarsk area in southern Siberia. The male samples tested from the same site belonged R1a. Nowadays K2b is found mostly in central and north-eastern Europe (R1a-dominant countries) and can therefore safely be linked to the diffusion of the R1a branch of the Indo-Europeans.
H1b, H1c, H2a1 and H11, some of the most common subclades of H in Eastern Europe and the North Caucasus, constantly show up around Siberia and Central Asia too.
Haplogroup T1a1a1 is fairly common in northern, central and eastern Europe, primarily among R1a populations. It is also found in Iran and Pakistan, among the Pathans, Brahui and Hunza Burusho, all ethnic groups with high percentages of R1a, but no R1b. Contrarily to T2, T1a1a1 has never been found in Europe before the Bronze Age, and make its first appearance with the Corded Ware culture. Additionally T1a1a1 emerged less than 7,000 years ago, barely one millennium before the beginning of the Bronze Age in the Pontic Steppe.
Haplogroup T2 is more complex because of its very large number of subclades. Some can nevertheless be connected with confidence with the Bronze Age diffusion of R1a. T2b2 and T2b4 were found in remains from the Corded Ware culture, and haven't been found in Europe before the Bronze Age. Nowadays, both subclades are found in most of Europe, but T2b2 is also found at low frequencies in Iran, Turkmenistan and India, while T2b4 has been found in Azerbaijan, Mesopotamia, Uzbekistan, Kazakhstan and Nepal. T2b16, a subclade typical of north-eastern Europe, has been found as far east as the Volga-Ural region and Kazakhstan. It might have been part of the Abashevo expansion towards the Urals. T2a1b1 was found in Bronze Age samples related to the Andronovo culture in southern Siberia, where Y-DNA samples all belonged to R1a. Furthermore, its modern distribution also matches territories with a high density of R1a, ranging from Scandinavia to Central Asia, cropping up sporadically in Iran and around the Near East.
Haplogroup W is generally found in regions with a high percentages of R1a and is particularly common in Balto-Slavic countries, in Bactria, northern Pakistan and northwest India. It has not been found in Paleolithic or Mesolithic European samples. Although some W1 samples did turn up in Late Neolithic Europe (one W1 in Spain and two W1c in Germany), they may ended up there through westward drift from the Pontic Steppe after centuries of intermarriages between neighbouring populations. Other W subclades suddenly popped up in Europe during the Bronze Age, such as in the Corded Ware (W5a, W6a'b) and Unetice (W3a1) cultures. In India, haplogroup W is considerably more common among the upper castes and among Indo-European speakers according to Metspalu et al. (2004).
Subclades of haplogroup V are the most difficult to set apart at the moment, principally because a lot of V subclades can only be identified by testing the coding region of the mitochondrial DNA - which is more costly and therefore less often carried out. Besides, most of the studies of non-European mtDNA to date only tested the hypervariable region (HVR). The only subclade that could be singled out at the moment as corresponding to R1a is V7a, which is specific to Slavic countries and has been found in Azerbaijan too.
The maternal lineages (mtDNA) corresponding to haplogroup Q1a
Mitochondrial haplogroup C is also found among Northeast Europeans, and appears to have been present in the region at least since the Mesolithic period. MtDNA C is a lineage found in Siberia (especially between the Altai and Lake Baikal, as well as in Beringia) and among Native Americans, typically in populations with high levels of Y-DNA haplogroup Q1a. It is also found among East Asians, although at a much lower frequency, just like Y-haplogroup Q1a. The subclades of C found in Europe are C1, C4a and C5, all subclades found in the Altai region, southern Siberia and particularly western Siberia. They could have been some of the maternal lineages of North Asian mammoth hunters during the late Paleolithic, alongside haplogroup U2 (presumably linked to Y-haplogroups C1a and R1). There is evidence that Paleolithic and Mesolithic North Asian hunters carrying Y-haplogroup Q1a and mt-haplogroup C reached Eastern Europe and Fennoscandia.
C5 was identified in Mesolithic Karelia (north-western Russia), while C4a2 was among the lineages of the Dnieper-Donets culture in Neolithic Ukraine. C4a3 and C4a6 samples dating from the Bronze Age (Catacomb culture) were also found in the Odessa region of Ukraine.
Both C4a and C5 are common among the Turkmens, Uzbeks and Tajiks today, populations with substantial levels of R1a and R1b, but who also carry Y-haplogroup Q1a. Haplogroup Q is found in 5 to 10% of Turkmen, Uzbek and Tajik men, but can reach 40% of the paternal lineages in Turkmens of Afghanistan and Iran who live in the areas adjacent to Turkmenistan. Haplogroup Q1a1-L715, which expanded from the Bronze Age onwards, is found all the way from Ireland to Uzbekistan via Hungary, Poland, Russia and the North Caucasus. It undeniably has an Indo-European connection.
In Europe, C4a and C5 are more typical of modern Baltic and Slavic populations, although their combined frequency is usually under 1% of the population. Li et al (2010) tested 20 mtDNA and 7 Y-DNA samples of Early Bronze Age (c. 2000 BCE) remains from the Tarim basin in Xinjiang , north-western China. Of these so-called Tarim mummies, 14 out of 20 mtDNA samples belonged to haplogroup C4, while the six others belonged to M*, R* (3 samples), H and K. All Y-DNA lineages turned out to be R1a1a.
C1 has been found at the Pre-Pottery Neolithic site of Tell Halula in northern Syria, in the region where cattle were first domesticated. As such, it is possible that C1 was brought by Y-haplogroup R1b, which could have originated in northern Mesopotamia before moving to the Pontic Steppe (see R1b history). C1 has also been found among modern Basques and Catalonians, two populations that completely lacks East Asian autosomal admixture and Y-haplogroup R1a, but have over 80% of R1b. The Basque stand out in Southwest Europe as the only population having any Y-haplogroup Q at all, even if only 0.5% of the male lineages.
See also: Identifying the original Indo-European mtDNA from isolated settlements
Find out the latest studies and discuss them on the Ancient DNA Forum.