Last update September 2013 (updated haplogroup A, C, G2a, I2, L, Q, R1a, R1b, T, and all maps). Tip : You can now access this page by typing directly haplogroups.eu into your browser.
The information about the origin and ethnic association of haplogroups on this website should not be read as hard facts, but, as is often the case in science, as a model in constant evolution based on the present knowledge and understanding (of the authors). Whenever the advancement of genetics couldn't provide irrefutable answers, we have attempted to provide the most likely and logical hypothesis based on archeological, historical and linguistic evidence. This page is being updated regularly to keep up with recent studies giving additional insights or rectifying possibly erroneous theories. Feel free to add comments or share your opinion on the forum.
Nucleobases are the alphabet of DNA. There are four of them : adenine (A), thymine (T), guanine (G) and cytosine (C). They always go by pairs, A with T, and G with C. Such pairs are called "base pairs".
The 46 chromosomes of human DNA are composed of a total of 3,000 million base pairs.
The Y chromosome possess 60 million nucleobases, against 153 million for the X chromosome.
Mitochondrial DNA is found outside the cell's nucleus, and therefore outside of the chromosomes. It consists only of 16,569 bases.
A SNP (single nucleotide polymorphism) is a mutation in a single base pair. At present, only a few hundreds SNP's define all the human haplogroups for mtDNA or Y-DNA.
DNA studies have permitted to categorise all humans on Earth in genealogical groups sharing one common ancestor at one given point in prehistory. They are called haplogroups. There are two kinds of haplogroups: the paternally inherited Y-chromosome DNA (Y-DNA) haplogroups, and the maternally inherited mitochondrial DNA (mtDNA) haplogroups. They respectively indicate the agnatic (or patrilineal) and cognatic (or matrilineal) ancestry.
Y-DNA haplogroups are useful to determine whether two apparently unrelated individuals sharing the same surname do indeed descend from a common ancestor in a not too distant past (3 to 20 generations). This is achieved by comparing the haplotypes through the STR markers. Deep SNP testing allows to go back much farther in time, and to identify the ancient ethnic group to which one's ancestors belonged (e.g. Celtic, Germanic, Slavic, Greco-Roman, Basque, Iberian, Phoenician, Jewish, etc.).
In Europe, mtDNA haplogroups are quite evenly spread over the continent, and therefore cannot be associated easily with ancient ethnicities. However, they can sometimes reveal some potential medical conditions (see diseases associated with mtDNA mutations). Some mtDNA subclades are associated with Jewish ancestry, notably K1a1b1a, K1a9,d K2a2a and N1b.
The study of Y-chromosomes is far more interesting than that of mitochondrial DNA for two reasons.
Firstly, the Y chromosome is a sequence of 60 million "characters" (nucleobases), against only 16,569 for mtDNA. The Y chromosome therefore offers a much greater resolution as mutations are more common, and indeed happen pretty much every generation. In contrast, mtDNA mutations happen much more infrequently. Since the time of the Mitochondrial Eve, approximately 200,000 years ago, modern humans have acquired in average 20 mtDNA mutations in each lineage - about one every ten thousand years. Even though the number of mutations has accelerated with the soaring of human population over the last 10,000 years, the dating of lineages based on mtDNA alone remains very approximate, and practically useless for historical times. By sequencing the full Y chromosome, it is theoretically possible to map the entire patrilineal genealogy of humanity (or any other species) within a few generations (in some cases even within one generation). This is a collossal task, and and expensive one too, since full chromosome sequencing (reading every nucleobase one by one) remains very expensive compared to SNP genotyping (checking only for mutations already discovered in other individuals). DNA tests provided to the general public (23andMe, FTDNA...) only use genotyping, so new mutations are not normally discovered by such tests (unless they are repeat mutations from other haplogroups). This is why population geneticists have only managed to sketch very broad lineages so far. The deepest subclades identified still encompass tens or hundreds of thousands of individuals.
The second advantage of Y-DNA over mtDNA is that men have traditionally been less mobile than women. In almost every settled, agricultural society, men are the ones who inherit their parents's property, and therefore remain in the same location generation after generation. Women, on the other hand, were often send away to marry in another village or town, so that their lineages spread more evenly over time.
Chronological development of Y-DNA haplogroups
K => 40,000 years ago (probably arose in northern Iran)
T => 30,000 years ago (around the Red Sea or around the Persian Gulf)
J => 30,000 years ago (in the Middle East)
R => 28,000 years ago (in the Central Asia)
E1b1b => 26,000 years ago (in Northeast Africa)
I => 25,000 years ago (in the Balkans)
R1a => 21,000 years ago (in southern Russia)
R1b => 20,000 years ago (around the Caspian Sea or Central Asia)
J1 => 20,000 years ago (in the Taurus/Zagros mountains)
J2 => 19,000 years ago (in northern Mesopotamia)
E-M78 => 18,000 years ago (in north-eastern Africa)
G => 17,000 years ago (in the Middle East)
I2 => 17,000 years ago (in the Balkans)
E-V13 => 14,000 years ago (in the southern Levant or North Africa)
I2b => 13,000 years ago (in Central Europe)
N1c1 => 12,000 years ago (in Siberia)
I2a => 11,000 years ago (in the Balkans)
G2a => 11,000 years ago (in the Levant or Anatolia)
R1b1b2 => 10,000 years ago (north or south of the Caucasus)
E-M81 => 9,500 years ago (in Northwest Africa)
I2b1 => 9,000 years ago (in Germany)
I2a1 => 8,000 years ago (in Southwest Europe)
I2a2 => 7,500 years ago (in Southeast Europe)
I1 => 5,000 years ago (in Scandinavia)
R1b-L21 => 4,000 years ago (in Central or Eastern Europe)
R1b-S28 => 3,500 years ago (around the Alps)
R1b-S21 => 3,000 years ago (in Frisia or Central Europe)
I2b1a => less than 3,000 years ago (in Britain)
Map of early Bronze Age cultures in Europe around 4,500 to 5,000 years ago
R1b is the most common haplogroup in Western Europe, reaching over 80% of the population in Ireland, the Scottish Highlands, western Wales, the Atlantic fringe of France and the Basque country. It is also common in Anatolia and around the Caucasus, in parts of Russia and in Central and South Asia. Besides the Atlantic and North Sea coast of Europe, hotspots include the Po valley in north-central Italy (over 70%), the Ossetians of the North Caucasus (over 40%) and nearby Armenia (35%), the Bashkirs of the Urals region of Russia (50%), Turkmenistan (over 35%), the Hazara people of Afghanistan (35%), the Uyghurs of North-West China (20%) and the Newars of Nepal (11%). R1b-V88, a subclade specific to sub-Saharan Africa, is found in 60 to 95% of men in northern Cameroon.
Anatolian or Caucasian origins ?
The Paleolithic origins of R1b are not entirely clear to this day. Some of the oldest forms of R1b are found around the Caucasus, in Iran and in southern Central Asia, a vast region where could have roamed the nomadic R1b hunter-gatherers during the Ice Age. Haplogroup R1* and R2* might have originated in southern Central Asia (between the Caspian depression and the Hindu Kush). A branch of R1 would have developed into R1b then R1b1 and R1b1a in the northern part of the Middle East around the time of the Last Glacial Maximum (circa 20,000 years ago), while R1a migrated north to Siberia. R1b1a presumptively moved to northern Anatolia and across the Caucasus during the Neolithic, where it split into R1b1b1 (M73) and R1b1b2 (M269). The Near Eastern leftovers evolved into R1b1c (V88), now found at low frequencies among the Lebanese, the Druze, and the Jews. The Phoenicians (who came from modern day Lebanon) spread this R1b1c to their colonies, notably Sardinia and the Maghreb.
R1b1a2 (the most common form in Europe) and R1b1a1 is closely associated with the spread of Indo-European languages, as attested by its presence in all regions of the world where Indo-European languages were spoken in ancient times, from the Atlantic coast of Europe to the Indian subcontinent, including almost all Europe (except Finland and Bosnia-Herzegovina), Anatolia, Armenia, European Russia, southern Siberia, many pockets around Central Asia (notably Xinjiang, Turkmenistan, Tajikistan and Afghanistan), without forgetting Iran, Pakistan, India and Nepal. The history of R1b and R1a are intricately connected to each others.
The North Caucasus and the Pontic-Caspian steppe : the Indo-European link
Modern linguists have placed the Proto-Indo-European homeland in the Pontic-Caspian Steppe, a distinct geographic and archeological region extending from the Danube estuary to the Ural mountains to the east and North Caucasus to the south. The Neolithic, Eneolithic and early Bronze Age cultures in Pontic-Caspian steppe has been called the Kurgan culture (7000-2200 BCE) by Marija Gimbutas, due to the lasting practice of burying the deads under mounds ("kurgan") among the succession of cultures in that region. It is now known that kurgan-type burials only date from the 4th millenium BCE and almost certainly originated south of the Caucasus.
Horses were first domesticated around 4600 BCE in the Caspian Steppe, perhaps somewhere around the Don or the lower Volga, and soon became a defining element of steppe culture. Nevertheless it is unlikely that R1b was already present in the eastern steppes at the time, so the domestication of the horse should be attributed to the indigenous R1a people.
It is not yet entirely clear when R1b crossed over from eastern Anatolia to the Pontic-Caspian steppe. This could have happened during or just after the Neolithic, or both. The genetic diversity of R1b being greater around the Caucasus it is hard to deny that R1b evolved there before entering the steppe world. It is possible that a first R1b migration from Anatolia in the 5th or even 6th millennium BCE introduced sheep into the steppe, an animal whose wool would play an important role in Celtic and Germanic (R1b branches of the Indo-Europeans) clothing traditions up to this day. Another migration across the Caucasus happened shortly before 3700 BCE, when the Maykop culture, the world's first Bronze Age society, appeared apparently out of nowhere in the Northwest Caucasus. The origins of Maykop are still uncertain, but archeologists have linked it to contemporary Chalcolithic cultures in Assyria and western Iran. Archeology also shows a clear diffusion of bronze working and kurgan-type burials from the Maykop culture to the Pontic Steppe, where the Yamna culture developed soon afterwards (from 3500 BCE). Kurgan (a.k.a. tumulus) burials would become a dominant feature of ancient Indo-European societies and were widely used by the Celts, Romans, Germanic tribes, and Scythians, among others.
The Yamna period (3500-2500 BCE) is the most important one in the creation of Indo-European culture and society. Middle Eastern R1b people had been living and blending to some extent with the local R1a foragers and herders for over a millennium, perhaps even two or three. The close cultural contact and interactions between R1a and R1b people all over the Pontic-Caspian Steppe resulted in the creation of a common vernacular, a new lingua franca, which linguists have called Proto-Indo-European (PIE). It is pointless to try to assign another region of origin to the PIE language. Linguistic similarities exist between PIE and Caucasian and Hurrian languages in the Middle East on the one hand, and Uralic languages in the Volga-Ural region on the other hand.
During the Yamna period cattle and sheep herders adopted wagons to transport their food and tents, which allowed them to move deeper into the steppe, giving rise to a new mobile lifestyle that would eventually lead to the great Indo-European migrations.
The Yamna horizon was not a single, unified culture. In the south, along the northern shores of the Black Sea coast until the the Northwest Caucasus, was a region of open steppe, expanding eastward until the Caspian Sea, Siberia and Mongolia (the Eurasian Steppe). The western section, between the Don and Dniester Rivers (and later the Danube), was the one most densely settled by R1b people, with only a minority of R1a people (5-10%). The eastern section, in the Volga basin until the Ural mountains, was inhabited by R1a people with a small minority (less than 5%) of R1b people. The northern part of the Yamna horizon was forest-steppe occupied by R1a people, also joined by a small minority of R1b. The western branch would migrate to the Balkans and Greece, then to Central and Western Europe, and back to Anatolia. The eastern branch would migrate to Central Asia, Xinjiang, Siberia, and South Asia (Iran, Pakistan, India). The northern branch would evolved into the Corded Ware culture and spread to the Baltic, Poland, Germany and Scandinavia.
The Maykop culture, the R1b link to the steppe ?
The Maykop culture (3700-2500 BCE) in the Northwest Caucasus was culturally speaking a sort of southern extension of the Yamna horizon. Although not generally considered part of the Pontic-Caspian steppe culture due to its geography, the North Caucasus had close links with the steppes, as attested by numerous ceramics, gold, copper and bronze weapons and jewelry in the contemporaneous cultures of Mikhaylovka, Sredny Stog and Kemi Oba. The link between the northern Black Sea coast and the North Caucasus is older than the Maykop period. Its predecessor, the Svobodnoe culture (4400-3700 BCE), already had links to the Suvorovo-Novodanilovka and early Sredny Stog cultures, and the even older Nalchik settlement (5000-4500 BCE) displayed a similar culture as Khvalynsk on the Volga. This may be the period when R1b started interracting and blending with the R1a population of the steppes.
The Yamna and Maykop people both used kurgan burials, placing their deads in a supine position with raised knees and oriented in a north-east/south-west axis. Graves were sparkled with red ochre on the floor, and sacrificed dometic animal buried alongside humans. They also had in common horse riding, wagons, a cattle- and sheep-based economy, the use of copper/bronze battle-axes (both hammer-axes and sleeved axes) and tanged daggers. In fact, the oldest wagons and bronze artefacts are found in the North Caucasus, and spread from there to the steppes.
Maykop was an advanced Bronze Age culture, actually one of the very first to develop metalworking, and therefore metal weapons. The world's oldest sword was found at a late Maykop grave in Klady kurgan 31. Its style is reminiscent of the long Celtic swords, though less elaborated. Horse bones and depictions of horses already appear in early Maykop graves, suggesting that the Maykop culture might have been founded by steppe people or by people who had close link with them. However, the presence of cultural elements radically different from the steppe culture in some sites could mean that Maykop had a hybrid population. Without DNA testing it is impossible to say if these two populations were an Anatolian R1b group and a G2a Caucasian group, or whether R1a people had settled there too. The two or three etnicities might even have cohabited side by side in different settlements. Typical Caucasian Y-DNA lineages (such as G2a) do not follow the pattern of Indo-European migrations, so intermarriages must have been limited, or at least restricted to Indo-European men taking Caucasian wives rather than the other way round. Only a small minority of G2a3b1 lineages are typically found alongside R1b in Indo-European populations.
Maykop people are the ones credited for the introduction of primitive wheeled vehicles (wagons) from Mesopotamia to the steppes. This would revolutionise the way of life in the steppe, and would later lead to the development of (horse-drawn) war chariots around 2000 BCE. Cavalry and chariots played an vital role in the subsequent Indo-European migrations, allowing them to move quickly and defeat easily anybody they encountered. Combined with advanced bronze weapons and their sea-based culture, the western branch (R1b) of the Indo-Europeans from the Black Sea shores are excellent candidates for being the mysterious Sea Peoples, who raided the eastern shores of the Mediterranean during the second millennium BCE.
The rise of the IE-speaking Hittites in Central Anatolia happened a few centuries after the disappearance of the Maykop and Yamna cultures. Considering that most Indo-European forms of R1b found in Anatolia today belong to the R1b-Z2103 subclade, it makes little doubt that the Hittites came to Anatolia via the Balkans, after Yamna/Maykop people invaded Southeast Europe. The Maykop and Yamna cultures were succeeded by the Srubna culture (1600-1200 BCE), possibly representing an advance of R1a1a people from the northern and eastern steppes towards the Black Sea shores, filling the vacuum left by the R1b tribes who migrated to Southeast Europe and Anatolia.
Migration map of Y-haplogroup R1b from the Paleolithic to the end of the Bronze Age (c. 1000 BCE)
The Siberian & Central Asian branch (M73)
When R1b crossed the Caucasus in the Late Neolithic, it split into two main groups. The western one (M269) would settle the eastern and northern of the Black Sea. The eastern one (M73) migrated to the Don-Volga region, where horses were domesticated circa 4600 BCE. R1b-M73 probably mixed with indigenous R1a people and founded the Repin culture (3700-3300 BCE) a bit before the Yamna culture came into existence in the western Pontic Steppe. R1b-M73 would then have migrated with horses along the Great Eurasian Steppe until the Altai mountains in East-Central Asia, where they established the Afanasevo culture (c. 3600-2400 BCE). Afanasevo people might be the precursors of the Tocharian branch of Indo-European languages alongside haplogroup R1a (=> see Tarim mummies).
The R1b-M73 people who stayed in the Volga-Ural region were probably the initiators of the Poltavka culture (2700-2100 BCE), then became integrated into the R1a-dominant Sintashta-Petrovka culture (2100-1750 BCE) linked to the Indo-Aryan conquest of Central and South Asia (=> see R1a for more details).
Nowadays R1b-M73 occurs almost exclusively around the Caucasus, in Russia and in very specific Central Asian populations. The highest percentages were observed among the Uyghurs (20%) of Xinjiang in north-west China, the Yaghnobi people of Tajikistan (32%), and the Bashkirs (47%, or 55% in the Abzelilovsky district) of Bashkortostan in Russia (border of Kazakhstan).
The European branch (M269)
The Indo-Europeans's bronze weapons and horses would have given them a tremendous advantage over the autochthonous inhabitants of Europe, namely the native haplogroup I (descendant of Cro-Magnon), and the early Neolithic herders and farmers (G2a, J, E1b1b and T). This allowed R1a and R1b to replace most of the native male lineages (=> see How did R1b come to replace most of the older lineages in Western Europe ?), although female lineages seem to have been less affected.
A comparison with the Indo-Iranian invasion of South Asia shows that 40% of the male linages of northern India are R1a, but less than 10% of the female lineages could be of Indo-European origin. The impact of the Indo-Europeans was more severe in Europe because European society 4,000 years ago was less developed in terms of agriculture, technology (no bronze weapons) and population density than that of the Indus Valley civilization. This is particularly true of the native Western European cultures where farming arrived much later than in the Balkans or central Europe. Greece, the Balkans and the Carpathians were the most advanced of European societies at the time and were the least affected in terms of haplogroup replacement. Native European Y-DNA haplogroups (I1, I2) also survived better in regions that were more difficult to reach or less hospitable, like Scandinavia, southern Switzerland, Sardinia or the Dinaric Alps.
The Conquest of "Old Europe" and Northern Europe
The first forrays of steppe people into the Balkans happened between 4200 BCE and 3900 BCE, when horse riders crossed the Dniester and Danube and apparently destroyed the towns of the Gumelnita, Varna and Karanovo VI cultures in Eastern Romania and Bulgaria. A climatic change resulting in colder winters during this exact period probably pushed steppe herders to seek milder pastures for their stock, while failed crops would have led to famine and internal disturbance within the Danubian and Balkanic communities. The ensuing Cernavoda culture (Copper Age, 4000-3200 BCE), Coțofeni culture (Copper to Bronze Age, 3500-2500 BCE) and Ezero culture (Bronze Age, 3300-2700 BCE), in modern Romania, seems to have had a mixed population of steppe immigrants and people from the old tell settlements. These steppe immigrants were likely a mixture of both R1a and R1b lineages, with a probably higher percentage of R1a than Yamna-era invasions.
The steppe invaders would have forced many Danubian farmers to migrate to the Cucuteni-Tripolye towns in the eastern Carpathians, causing a population boom and a north-eastward expansion until the Dnieper valley, bringing Y-haplogroups G2a, E1b1b, J and T in what is now central Ukraine. This precocious Indo-European advance westward was fairly limited, due to the absence of Bronze weapons and organised army at the time, and was indeed only possible thanks to climatic catastrophes. The Carphatian, Danubian, and Balkanic cultures were too densely populated and technologically advanced to allow for a massive migration.
The forest-steppe R1a people successfully penetrated into the heart of Europe with little hindrance, due to the absence of developed agrarian societies around Poland and the Baltic. The Corded Ware (Battle Axe) culture (3200-1800 BCE) was a natural western expansion of the Yamna culture, reaching as far west as Germany and as far north as Sweden and Norway. DNA analysis from the Corded Ware culture site of Eulau confirms the presence of R1a (but not R1b) in central Germany around 2600 BCE. The Corded Ware migrants might well have expanded from the forest-steppe, or the northern fringe of the Yamna culture, where R1a lineages were prevalent over R1b ones.
The expansion of R1b people into Old Europe was slower, but proved inevitable. In 2800 BCE, by the time R1a had reached Scandinavia, the Bronze Age R1b cultures had barely moved into the Pannonian steppe. They established major settlements in the Great Hungarian Plain, the most similar habitat to their ancestral Pontic Steppes. Around 2500 BCE, they were poised for their next major expansion into modern Germany and Western Europe. By that time, the R1b immigrants had blended thoroughly with the indigenous Mesolithic and Neolithic populations of the Danubian basin, where they had now lived for 1,700 years.
The strongly partriarchal Indo-European elite remained almost exclusively R1b on the paternal side, but absorbed a high proportion of non-Indo-European maternal lineages. Hybridised, the new Indo-Europeans would have lost most of their remaining Proto-Europoid or Mongolid features. Their light hair, eye and skin pigmentation, once interbred with the darker inhabitants of Old Europe, became more like that of modern Southern Europeans. The R1a people of the Corded Ware culture would come across far less populous societies in Northern Europe, mostly descended from the lighter Mesolithic population (haplogroup I1 and I2), and therefore retain more of their original pigmentation (although facial traits evolved considerably in Scandinavia, where the I1 inhabitants were strongly dolicocephalic and long-faced, as opposed to the brachycephalic and broad-faced steppe people).
The Conquest of Western Europe
The R1b conquest of Europe happened in two phases. For nearly two millennia, starting from circa 4200 BCE, steppe people limited their conquest to the rich Chalcolithic civilisations of the Carpathians and the Balkans. These societies possessed the world's largest towns, notably the tell settlements of the Cucuteni-Tripolye culture. Nothing incited the R1b conquerors to move further into Western Europe at such an early stage, because most of the land north and west of the Alps was still sparsely populated woodland. The Neolithic did not reach the British Isles and Scandinavia before circa 4000 BCE. Even northern France and most of the Alpine region had been farming or herding for less than a millennium. Northwest Europe remained a tribal society of hunter-gatherers practising only limited agriculture for centuries after the conquest of the Balkans. Why would our R1b "conquistadors" leave the comfort of the wealthy and populous Danubian civilisations for the harsh conditions that laid beyond ? Bronze Age people wanted metal, tin, copper, and gold, of which the Balkans had plenty, but that no one had yet discovered in Western Europe.
R1b-L51 is thought to have arrived in central Europe (Hungary, Austria, Bohemia) around 2500 BCE, approximately two millennia after the shift to the Neolithic in these regions. Agrarian towns had started to develop. Gold and copper had begun to be mined. The prospects of a conquest were now far more appealing.
The archeological and genetic evidence (distribution of R1b subclades) point at several consecutive waves towards eastern and central Germany between 2800 BCE and 2300 BCE (beginning of the Unetice culture). Unetice was probably the first culture dominanted by R1b-L11 lineages. It is interesting to note that the Unetice period happen to correspond to the end of the Maykop (2500 BCE) and Kemi Oba (2200 BCE) cultures on the northern shores of the Black Sea, and their replacement by cultures descended from the northern steppes. It can therefore be envisaged that the (mostly) R1b population from the northern half of the Black Sea migrated westward due to pressure from other Indo-European people (R1a) from the north, like the burgeoning Proto-Indo-Iranian branch, linked to the contemporary Poltavka and Abashevo cultures.
It is doubtful that the Beaker culture (2800-1900 BCE) was already Indo-European because they were the continuity of the native Megalithic cultures. It is more likely that the beakers and horses found across western Europe during that period were the result of trade with neighbouring Indo-European cultures, including the first wave of R1b into central Europe. It is equally possible that the Beaker people were R1b merchants or explorers who travelled across Western Europe and brought back tales of riches poorly defended by Stone Age people and waiting to be conquered by the more advanced Indo-Europeans, with their bronze weapons and horses.
What is undeniable is that the following Unetice (2300-1600 BCE), Tumulus (1600-1200 BCE), Urnfield (1300-1200 BCE) and Hallstatt (1200-750) cultures were linked to the spread of R1b to Europe, as they abruptly introduce new technologies and a radically different lifestyle.
The Hittites (c. 2000-1178 BCE) were the first Indo-Europeans to defy (and defeat) the mighty Mesopotamian and Egyptian empires. There are two hypotheses regarding the origins of the Hittites. The first is that they came from the eastern Balkans and invaded Anatolia by crossing the Bosphorus. That would mean that they belonged either to the L23 or the Z2103 subclade. The other plausible scenario is that they were an offshoot of the late Maykop culture, and that they crossed the Caucasus to conquer the Hattian kingdom (perhaps after being dislaced from the North Caucasus by the R1a people of the Catacomb culture). In that case the Hittites might have belonged to the R1b-M269 subclade. The first hypothesis has the advantage of having a single nucleus, the Balkans, as the post-Yamna expansion of all Indo-European R1b. The Maykop hypothesis, on the other hand, would explain why the Anatolian branch of IE languages (Hittite, Luwian, Lydian, Palaic) is so archaic compared to other Indo-European languages, which originated in Yamna instead of Maykop.
There is substantial archaeological and linguistic evidence that Troy was an Indo-European city associated with the steppe culture and haplogroup R1b. The Trojans were Luwian speakers related to the Hittites (hence Indo-European), with proven cultural ties to the culture of the Pontic-Caspian steppe. The first city of Troy dates back to 3000 BCE, right in the middle of the Maykop period. Troy might have been founded by Maykop people as a colony securing the trade routes between the Black Sea and the Aegean. The founding of Troy happens to coincide exactly with the time the first galleys were made. Considering the early foundation of Troy, the most likely of the two Indo-European paternal haplogroups would be R1b-M269 or L23.
The Phrygians and the Proto-Armenians are two other Indo-European tribes stemming from the Balkans. Both appear to have migrated to Anatolia around 1200 BCE, during the 'great upheavals' of the Eastern Mediterranean (see below). The Phrygians (or Bryges) founded a kingdom (1200-700 BCE) in west central Anatolia, taking over most of the crumbling Hittite Empire. The Armenians crossed all Anatolia until Lake Van and settled in the Armenian Highlands. Nowadays 30% of Armenian belong to haplogroup R1b, the vast majority to the L23 subclade (=> see The Indo-European migrations to Armenia).
Most of the R1b found in Greece today is of the Balkanic L23 variety. There is also a minority of Proto-Celtic S116/P312 and of Italic/Alpine Celtic S28/U152. L23 could have descended from Albania or Macedonia during the Dorian invasion (see below), thought to have happened in the 12th century BCE. Their language appear to have been close enough to Mycenaean Greek to be mutually intelligible and easy for locals to adopt. The Mycenaeans might have brought some R1b (M269 or L23) to Greece, but their origins can be traced back through archaeology to the Catacomb culture and the Seima-Turbino phenomenon of the northern forest-steppe, which would make them primarily an R1a1a tribe.
Greek and Anatolian S116 and some S28 lineages could be attributed to the La Tène Celtic invasions of the 3rd century BCE. The Romans also certainly brought S28 lineages, and probably also the Venetians later on, notably on the islands. Older clades of R1b, such as P25 and V88, are only a small minority and would have come along E1b1b, G2a and J2 from the Middle East.
R1a is thought to have been the dominant haplogroup among the northern and eastern Indo-European speakers who evolved into the Indo-Iranian, Mycenaean Greek, Thracian, Baltic and Slavic branches. The Proto-Indo-Europeans originated in the Yamna culture (3300-2500 BCE), in the Pontic-Caspian steppe between modern Ukraine and south-west Russia. Their expansion is linked to the domestication of horses in the Eurasian steppes, and the invention of the chariot (see R1b above).
The eastern part of the Pontic-Caspian steppes is strongly associated with the Indo-Iranian and Balto-Slavic branches of Indo-European languages. Based on archeological, linguistic and genetic data, it is possible to say that the pastoralist nomads who lived in the northern Russian steppes and forest-steppes 5,000 years ago carried predominantly R1a paternal lineages.
Nowadays, high frequencies of R1a are found in Poland (55% of the population), Ukraine (40 to 65%), European Russia (45 to 65%), Belarus (49%), Slovakia (42%), Latvia (40%), Lithuania (38%), the Czech Republic (34%), Hungary (32%), Norway (27%), Austria (26%), Croatia (24%), north-east Germany (24%) Sweden (19%), and Romania (18%).
The Germanic branch
The first major expansion of R1a took place with the westward propagation of the Corded Ware (or Battle Axe) culture (2800-1800 BCE) from the northern forest-steppe in the Yamna homeland. This was the first wave of R1a into Europe, the one that brought the Z283 subclade to Germany and the Netherlands, and Z284 to Scandinavia.
The Germanic branch of Indo-European languages probably evolved from a merger of Corded-Ware R1a and the later arrival of R1b people from Central Europe. This is supported by the fact that Germanic people are a R1a-R1b hybrid, that these two haplogroups came via separate routes at different times, and that Proto-Germanic language is closest to Proto-Italo-Celtic, but also shares similarities with Proto-Slavic. The Corded Ware R1a people would have mixed with the pre-Germanic I1 and I2 aborigines, which resulted in the first Indo-European culture in Germany and Scandinavia, although that culture could not be considered Proto-Germanic - it was simply Proto-Indo-European at that stage, or perhaps or Proto-Balto-Slavic.
The R1b branch of the Indo-Europeans is thought to have originated in the southern Yamna culture (northern shores of the Black Sea). It was the first one to move from the steppes to Europe, invading the Danube delta around 4200 BCE, then making its way around the Balkans and the Hungarian plain in the 4th millennium BCE. It is likely that a minority of R1a people accompanied this R1b migration. Those R1a men would have belonged to the L664 subclade, the first to split from the Yamna core. These early steppe invaders were not a homogeneous group, but a cluster of tribes. It is possible that the R1a-L664 people were one or several separate tribes of their own, or that they mixed with some R1b lineages, notably R1b-U106, which would become the main Germanic lineage many centuries later. The R1b-R1a contingent moved up the Danube to the Panonian plain around 2800 BCE, brought to an end the local Bell Beaker (circa 2200 BCE) and Corded Ware (c. 2400 BCE) cultures in Central Europe, and set up the Unetice culture (2300-1600 BCE) around Bohemia and eastern Germany. Unetice can be seen as the source of future Germanic, Celtic and Italic cultures, and is associated with the L11 subclade of R1b.
The late Unetice culture expanded to Scandinavia, founding the Nordic Bronze Age (1800-500 BCE). R1a-L664 and R1b (L11 and U106) presumably reached Scandinavia at this time. People from the Nordic Bronze Age probably spoke a Proto-Germanic language, which for over a thousand years acquired vocabulary from the indigenous Corded Ware language, itself a mixture of Proto-Balto-Slavic and non-IE pre-Germanic. The first genuine Germanic tongue has been estimated by linguists to have come into existence around (or after) 500 BCE, just as the Nordic Bronze Age came to an end, giving way to the Pre-Roman Iron Age. The uniqueness of some of the Germanic vocabulary points at borrowing from native pre-Indo-European languages (Germanic substrate theory). The Celtic language itself is known to have borrowed from Afro-Asiatic languages spoken by Near-Eastern immigrants to Central Europe. The fact that present-day Scandinavia is composed of roughly 40% of I1, 20% of R1a and 40% of R1b reinforces the idea that the Germanic ethnicity and language had acquired a tri-hybrid character by the Iron Age.
The Baltic branch
The Baltic branch is thought to have evolved from the Fatyanovo culture (3200-2300 BCE), the northeastern extension of the Corded Ware culture. Early Bronze Age R1a nomads from the northern steppes and forest-steppes would have mixed with the indigenous Uralic-speaking inhabitants (N1c1 lineages) of the region. This is supported by a strong presence of both R1a and N1c1 haplogroups from southern Finland to Lithuania and the adjacent part of Russia.
The Slavic branch
The origins of the Slavs go back to circa 3500 BCE with the northern Yamna culture. The M412 and Z280 lineages spread around Poland, Belarus, Ukraine and western Russia, and would form the core of the Proto-Slavic culture. The high prevalence of R1a in Balto-Slavic countries nowadays is not only due to the Corded Ware expansion, but also to a long succession of later migrations from Russia, the last of which took place from the 5th to the 1th century CE. The Slavic branch differentiated itself when the Corded Ware culture absorbed the Cucuteni-Tripolye culture (5200-2600 BCE) of western Ukraine and north-eastern Romania, which appears to have been composed primarily of I2a1b (M423) lineages descended directly from Paleolithic Europeans, with a small admixture of Near-Eastern immigrants (notably E1b1b, G2a, J and T). Thus emerged the hybrid Globular Amphora culture (3400-2800 BCE) in what is now Ukraine, Belarus and Poland. It is surely during this period that I2a2, E-V13 and T spread (along with R1a) around Poland, Belarus and western Russia, explaining why eastern and northern Slavs (and Lithuanians) have between 10 and 20% of I2a1b lineages and about 10% of Middle Eastern lineages (18% for Ukrainians). After just a few centuries, this hybridised culture faded away into the dominant Corded Ware (2800-1800 BCE) and Catacomb (2800-1800 BCE) cultures.
The Corded Ware period was followed in the steppes by the Srubna culture (1800-1200 BCE), and around Poland by the Trzciniec culture (1700-1200 BCE). The last important Slavic migration is thought to have happened in the 6th century CE, from Ukraine to Poland, the Czech Republic and Slovakia, filling the vacuum left by eastern Germanic tribes who invaded the Roman Empire.
Historically, no other part of Europe was invaded a higher number of times by steppe peoples than the Balkans. Chronologically, the first R1a invaders came with the westward expansion of the Yamna culture (from 4200 BCE), a succession of steppe migrations that lasted about 2000 years. Then came the Thracians (1500 BCE), followed by the Illyrians (around 1200 BCE), the Huns and the Alans (400 CE), the Avars, the Bulgars and the Serbs (all around 600 CE), and the Magyars (900 CE), among others. These peoples originated from different parts of the Eurasian steppes, anywhere between Eastern Europe and Central Asia, which is why such high STR diversity is found within Balkanic R1a nowadays. It is not yet possible to determine the ethnic origin for each variety of R1a, apart from the fact that about any R1a is associated with tribes from Eurasian steppe at one point in history.
The Indo-Iranian branch
Proto-Indo-Iranian speakers, the people who later called themselves 'Aryans' in the Rig Veda and the Avesta, originated in the Sintashta-Petrovka culture (2100-1750 BCE), in the Tobol and Ishim valleys, east of the Ural Mountains. It was founded by pastoralist nomads from the Abashevo culture (2500-1900 BCE), ranging from the upper Don-Volga to the Ural Mountains, and the Poltavka culture (2700-2100 BCE), extending from the lower Don-Volga to the Caspian depression. The Sintashta-Petrovka culture was the first Bronze Age advance of the Indo-Europeans west of the Urals, opening the way to the vast plains and deserts of Central Asia to the metal-rich Altai mountains. The Aryans quickly expanded over all Central Asia, from the shores of the Caspian to southern Siberia and the Tian Shan, through trading, seasonal herd migrations, and looting raids.
Horse-drawn war chariots seem to have been invented by Sintashta people around 2100 BCE, and quickly spread to the mining region of Bactria-Margiana (modern border of Turkmenistan, Uzbekistan, Tajikistan and Afghanistan). Copper had been extracted intensively in the Urals, and the Proto-Indo-Iranians from Sintashta-Petrovka were exporting it in huge quantities to the Middle East. They appear to have been attracted by the natural resources of the Zeravshan valley for a Petrovka copper-mining colony was established in Tugai around 1900 BCE, and tin was extracted soon afterwards at Karnab and Mushiston. Tin was an especially valued resource in the late Bronze Age, when weapons were made of copper-tin alloy, stronger than the more primitive arsenical bronze. In the 1700's BCE, the Indo-Iranians expanded to the lower Amu Darya valley and settled in irrigation farming communities (Tazabagyab culture). By 1600 BCE, the old fortified towns of Margiana-Bactria were abandoned, submerged by the northern steppe migrants. The group of Central Asian cultures under Indo-Iranian influence is known as the Andronovo horizon, and lasted until 800 BCE.
The Indo-Iranian migrations progressed further south across the Hindu Kush. By 1700 BCE, horse-riding pastoralists had penetrated into Balochistan (south-west Pakistan). The Indus valley succumbed circa 1500 BCE, and the northern and central parts of the Indian subcontinent were taken over by 500 BCE. Westward migrations led Old Indic Sanskrit speakers riding war chariots to Assyria, where they became the Mitanni rulers from circa 1500 BCE. The Medes, Parthians and Persians, all Iranian speakers from the Andronovo culture, moved into the Iranian plateau from 800 BCE. Those that stayed in Central Asia are remembered by history as the Scythians, while the Yamna descendants who remained in the Pontic-Caspian steppe became known as the Sarmatians to the ancient Greeks and Romans.
The Indo-Iranian migrations have resulted in high R1a frequencies in southern Central Asia, Iran and the Indian subcontinent. The highest frequency of R1a (about 65%) is reached in a cluster around Kyrgyzstan, Tajikistan and northern Afghanistan. In India and Pakistan, R1a ranges from 15 to 50% of the population, depending on the region, ethnic group and caste. R1a is generally stronger is the North-West of the subcontinent, and weakest in the Dravidian-speaking South (Tamil Nadu, Kerala, Karnataka, Andhra Pradesh) and from Bengal eastward. Over 70% of the Brahmins (highest caste in Hindusim) belong to R1a1, due to a founder effect.
Maternal lineages in South Asia are, however, overwhelmingly pre-Indo-European. For instance, India has over 75% of "native" mtDNA M and R lineages and 10% of East Asian lineages. In the residual 15% of haplogroups, approximately half are of Middle Eastern origin. Only about 7 or 8% could be of "Russian" (Pontic-Caspian steppe) origin, mostly in the form of haplogroup U2 and W (although the origin of U2 is still debated). European mtDNA lineages are much more common in Central Asia though, and even in Afghanistan and northern Pakistan. This suggests that the Indo-European invasion of India was conducted mostly by men through war, and the first major settlement of women was in northern Pakistan, western India (Punjab to Gujarat) and northern India (Uttar Pradesh), where haplogroups U2 and W are the most common.
Turkic speakers and R1a
The present-day inhabitants of Central Asia, from Xinjiang to Turkey and from the Volga to the Hindu Kush, speak in overwhelming majority Turkic languages. This may be surprising as this corresponds to the region where the Indo-Iranian branch of Indo-European speakers expanded, the Bronze-Age Andronovo culture, and the Iron-Age Scythian territory. So why is it that Indo-European languages only survives in Slavic Russia or in the southern part of Central Asia, in places like Tajikistan, Afghanistan or some parts of Turkmenistan ? Why don't the Uyghurs, Uzbeks, Kazakhs and Kyrgyzs, or the modern Pontic-Caspian steppe people (Crimean Tatars, Nogais, Bashkirs and Chuvashs) speak Indo-European vernaculars ? Genetically these people do carry Indo-European R1a, and to a lesser extent also R1b, lineages. The explanation is that Turkic languages replaced the Iranian tongues of Central Asia between the 4th and 11th century CE.
Proto-Turkic originated in Mongolia and southern Siberia with such nomadic tribes as the Xiongnu. It belongs to the Altaic linguistic family, like Mongolian and Manchu (some also include Korean and Japanese, although they share very little vocabulary in common). It is unknown when Proto-Turkic first emerged, but its spread started with the Hunnic migrations westward through the Eurasian steppe and all the way to Europe, only stopped by the boundaries of the Roman Empire.
The Huns were the descendants of the Xiongnu. Ancient DNA tests have revealed that the Xiongnu were already a hybrid Eurasian people 2,000 years ago, with mixed European and North-East Asian Y-DNA and mtDNA. Modern inhabitants of the Xiongnu homeland have approximately 90% of Mongolian lineages against 10% of European ones. The oldest identified presence of European mtDNA around Mongolia and Lake Baikal dates back to over 6,000 years ago.
It appears that Turkic quickly replaced the Scythian and other Iranian dialects all over Central Asia. Other migratory waves brought more Turkic speakers to Eastern and Central Europe, like the Khazars, the Avars, the Bulgars and the Turks (=> see 5000 years of migrations from the Eurasian steppes to Europe). All of them were in fact Central Asian nomads who had adopted Turkic language, but had little if any Mongolian blood. Turkic invasions therefore contributed more to the diffusion of Indo-European lineages (especially R1a1) than East Asian ones.
Turkic languages have not survived in Europe outside the Pontic-Caspian steppe. Bulgarian language, despite being named after a Turkic tribe, is actually a Slavic tongue with a mild Turkic influence. Hungarian, sometimes mistaken for the heir of Hunnic because of its name, is in reality an Uralic language (Magyar). the The dozens of Turkic languages spoken in the world today have a high degree of mutual intelligibility due to their fairly recent common origin and the nomadic nature of its speakers (until recently). Its two main branches Oghuz and Oghur could be seen as two languages about as distant as Spanish and Italian, and languages within each branch like regional dialects of Spanish and Italian.
The Greek branch
Little is known about the arrival of Proto-Greek speakers from the steppes. The Mycenaean culture commenced circa 1650 BCE and is clearly an imported steppe culture. The close relationship between Mycenaean and Proto-Indo-Iranian languages suggest that they split fairly late, some time between 2500 and 2000 BCE. Archeologically, Mycenaean chariots, spearheads, daggers and other bronze objects show striking similarities with the Seima-Turbino culture (c. 1900-1600 BCE) of the northern Russian forest-steppes, known for the great mobility of its nomadic warriors (Seima-Turbino sites were found as far away as Mongolia). It is therefore likely that the Mycenaean descended from Russia to Greece between 1900 and 1650 BCE, where they intermingled with the locals to create a new unique Greek culture.
Haplogroup I (Y-DNA)
Haplogroup I is the oldest major haplogroup in Europe and in all probability the only one that originated there (apart from very minor haplogroups like C6 and deep subclades of other haplogroups). It is thought to have arrived from the Middle East as haplogroup IJ sometime between 40,000 and 30,000 years ago, and developed into haplogroup I approximately 25,000 years ago. In other words, Cro-Magnons most probably belonged to IJ and I (alongside older haplogroups like F and C6).
The earliest megalithic structures (5000-1200 BCE) of Europe were built by men belonging to haplogroup I, then were joined by Neolithic newcomers such as G2a and E1b1b.
The I1 branch is estimated to have split away 20,000 years ago and evolved in isolation in Scandinavia during the late Paleolithic and Mesolithic. I1 is defined by at least 25 unique mutations, which indicates that this lineage experienced a serious population bottleneck. Men belonging to this haplogroup all descend from a single ancestor who lived between 10,000 and 7,000 years ago.
During the Mesolithic period, pre-I1 and I1 people were part of the successive Ertebølle culture (5300-3950 BCE), Funnelbeaker culture (4000-2700 BCE) and Pitted Ware culture (3200-2300 BCE). The latter two are sometimes considered as Neolithic cultures due the introduction of farming. However, Neolithic farmers from Germany penetrated late into Scandinavia and in small numbers, and the lifestyle remained primarily one of hunter-gatherers. This is probably the reason why Scandinavia retained one of the most substantial Paleolithic ancestry in Europe.
How did I1 become Germanic ?
From 2800 BCE, a large-scale cultural and genetic upheaval hit Scandinavia with the arrival of the Indo-Europeans from Eastern Europe, who brought the Copper Age and Early Bronze Age practically without Neolithic transition. The first Indo-Europeans to reach Scandinavia were the Corded Ware people from modern Russia, Belarus and Poland, who are thought to have belonged predominantly to haplogroup R1a. These people carried similar maternal lineages as Scandinavian I1 inhabitants - in great majority mtDNA haplogroups U4 and U5.
The second major Indo-European migration to Scandinavia was that of haplogroup R1b, the branch that is thought to have introduced Proto-Germanic languages, as an offshoot of the Proto-Celto-Germanic speakers from Central Europe. R1b probably entered Scandinavia from present-day Germany as a northward expansion of the late Unetice culture (2300-1600 BCE).
According to the Germanic substrate hypothesis, first proposed by Sigmund Feist in 1932, Proto-Germanic was a hybrid language mixing Indo-European (R1b, and to a lower extent R1a) and pre-Indo-European (native Nordic I1) elements. This hybridisation would have taken place during the Bronze Age and given birth to the first truly Germanic civilization, the Nordic Bronze Age (1700-500 BCE).
Distribution of haplogroup I1 in Europe
The Germanic migrations dispersed I1 lineages to Britain (Anglo-Saxons), Belgium (Franks, Saxons), France (Franks, Visigoths and Burgundians), South Germany (Franks, Alamanni, Suebi, Marcomanni, Thuringii and others), Switzerland (Alamanni, Suebi, Burgundians), Iberia (Visigoths, Suebi and Vandals), Italy (Goths, Vandals, Lombards), Austria and Slovenia (Ostrogoths, Lombards, Bavarians), Ukraine and Moldova (Goths), as well as around Hungary and northern Serbia (Gepids). The I1 found among the Poles (6%), Czechs (11%), Slovaks (6%) and Hungarians (8%) is also the result of centuries of influence from their German and Austrian neighbours. The relativelemy high frequency of I1 around Serbia and western Bulgaria (5% to 10%) could be owed to the Goths who settled in the Eastern Roman Empire in the 3rd and 4th centuries.
The Danish and Norwegian Vikings brought more I1 to Britain, Ireland, the Isle of Man, Normandy, Flanders, Iberia, Sicily... The Swedish Vikings (Varangians) set up colonies in Russia and Ukraine, and outposts as far as the Byzantine Empire, the Caucasus and Persia. The higher frequency of I1 in Northwest Russia (east of the Baltic) hints at had a particularly strong Varangian presence, which is concordant with the establishment of the Kievan Rus' by the Swedes.
I2 (M438/P215/S31) is thought to have originated during the Late Paleolithic, around the time of the Last Glacial Maximum, some 22,000 years ago. Its region of origin is undetermined at present. It could have been one of the Last Glacial Maximum refugia or somewhere in Anatolia or around the Caucasus. Three hypotheses are consequently possible.
The first scenario is that I2 originated in Europe. When the ice sheets started receding to the north from 20,000 to 12,000 years ago, the I2 hunter-gatherers re-expanded from their LGM refugium and colonised vast parts of western, central and Eastern Europe. In this hypothesis I2 would be associated with mtDNA haplogroups H1, H3, U5 and V, among others.
In the second scenario I2 originated in West Asia, but also colonised Europe when the ice sheets receded. In this hypothesis I2 would be mostly associated with mtDNA haplogroups J and T.
In the third and least likely scenario, I2 originated in West Asia but did not come to Europe until the Neolithic. There seems to have been several independent migrations of Neolithic farmers and herders from the Middle East to Europe, bringing lineages such as G2a, E1b1b, J and T. It is not yet clear at present whether each group brought only one or perhaps two haplogroups, or whether most migrations already comported blends of many haplogroups. In this hypothesis I2 could be associated with mtDNA haplogroups N1a, R, HV, H (various subclades), J, T, K and X.
In the two first cases I2 would have been absorbed by Neolithic farmers in Southeast Europe (M423), Central Europe (P214, L596), and the western Mediterranean (M26). The relative success of specific branches of I2 seem to be linked to the diffusion of agriculture. The south-western I2a1a (M26) branch was absorbed by Neolithic farmers of the Printed-Cardium Pottery culture (5000-1500 BCE), whose descendants are found mostly in modern Sardinians and Basques. The eastern I2a1b (M423) is probably linked to the Cucuteni-Trypillian culture (4800-3000 BCE), which was the most advanced Neolithic culture in Europe before the Indo-European invasions in the Bronze Age. In contrast, central, northern and Western European I2 lineages (such as L38, M223, L1286, L1294 and L880) only survived at low frequencies. The reason could be that I2 hunter-gatherers adopted agriculture too late and were not numerous when the wave of Indo-European took over central, northern and Western Europe (=> see R1b history).
Haplogroup I2a1a1 (M26)
I2a1a (M26, L158, L159.1/S169.1) was known as I1b2 until 2005, I1b1b in 2006-7, and I2a1 from 2008 to 2010. It is found in all Western Europe, and reaches maximum frequencies among the Sardinians (37.5%) and the Basques (5%), two population isolates. M26 is geographically restricted to the British Isles, the Low Countries, France, western Germany, Switzerland, Sardinia, Sicily, the west coast of Italy, Iberia and the Mediterranean coast of the Maghreb. The only M26 negative for the L160 mutation are confined to Ireland.
I2a1a-M26 was probably one of the main paternal lineages of the Megalithic cultures of Western Europe during the Neolithic and Chalcolithic periods.
Haplogroup I2a1b (M423)
I2a1b (M423, L178) was known as I1b until 2007, and I2a2 from 2008 to 2010. The main subclade, representing over 90% of all M423 lineages is L621 and its subclade L147.2. The other subclades are L41.2 (very rare) and L161.1 (found mostly in Germany and the British Isles).
This branch is found overwhelmingly in Slavic countries. Its maximum frequencies are observed among the Dinaric Slavs (Slovenes, Croats, Bosniaks, Serbs, Montenegrins and Macedonians) as well as in Bulgaria, Romania, Moldavia, western Ukraine and Belarus. It is also common to a lower extent in Albania, Greece, Hungary, Slovakia, Poland, and south-western Russia. I2-L621 (L147.2+) is also known as as I2a-Din (for Dinaric).
The high concentration of I2a1b-L621 in north-east Romania, Moldova and central Ukraine reminds of the maximum spread of the Cucuteni-Trypillian culture (4800-3000 BCE) before it was swallowed by the Indo-European Corded Ware culture. This could mean that the Cucuteni-Tripolye culture was a native European group of hunter-gatherers who adopted farming after coming in contact (with perhaps some intermarriages) with the Middle Eastern farmers who settled in the Balkans (haplogroups E1b1b, G2a, J2b and T). After being Indo-Europeanized, I2a-L621 would have become the dominant paternal lineage among southern Slavs, while R1a remained dominant among northern Slavs.
The presence of I2a-L621 in Romania and Bulgaria could be attributed to the migration of the ancient Dacians and Thracians, who emerged as a mixture of of indigenous peoples and Indo-Europeans (in this case, essentially R1a-Z280) sometime between 3300 and 1500 BCE. The Illyrians, who conquered the territory of former Yugoslavia circa 1200-1000 BCE, might have been an offshoot from the Dacians or the Thracians, or a closely related tribe from the Carpathian basin.
The second great expansion of I2a-Din took place with the Slavic migration in the Late Antiquity and Early Middle Ages. I2a-Din had started to mix with Proto-Indo-Euroepan R1a around Moldova, Ukraine, Belarus and Poland during the Corded Ware period (2900-2400 BCE), then disseminated more uniformly across Proto-Slavic tribes during the Bronze and Iron Ages. After Germanic tribes living in eastern Germany and Poland, like the Goths, the Vandals and the Burgundians, invaded the Roman Empire, the Slavs from further east filled the vacuum. Following the collapse of the Western Roman Empire in 476, the Slavs moved in the Dinaric Alps and the Balkans. By the 9th century the Slavs occupied all modern Slavic-speaking territories, apart from the eastern Balkans under the control of the Turkic-speaking Bulgars.
Nowadays northern Slavic countries have between 9% (Poland, Czech republic) and 21% (Ukraine) of I2a-L621, while southern Slavs have between 20% (Bulgaria) and 50% (Bosnia). The higher percentage of I2a-Din in the south owes to the cumulative effect of Bronze Age and Early Iron Age migrations (Dacians, Thracians, Illyrians) and the medieval Slavic migrations. The relatively high percentage of of I2a-L621 in non-Slavic people like the Hungarians (15% ), Albanians (12%) and Greeks (9%) dates from the Bronze Age and population movement inside the Roman Empire which redistributed I2a beyond the original Daco-Thracian and Illyrian territories. Based on these frequencies, and the distribution of R1a subclades, it can be assessed that the Daco-Thracians and Illyrians carried approximately two to three times more I2a-Din than R1a, while the Early Slavs must have had roughly twice more R1a than I2a-Din. The higher proportion of R1a in many northern Slavic countries today is due to earlier migrations of R1a during Bronze Age (such as L260 among West Slavs and Z92 and Z93 among Russians and Belarussians).
Distribution of haplogroup I2a1 (formerly I2a) in Europe
Haplogroup I2a2 (P214)
I2a2 (S33/M436/P214, P216/S30, P217/S23, P218/S32, L35/S150, L37/S153, L181) was known as I1c until 2005 and I2b until 2010. It is associated with the pre-Celto-Germanic people of north-Western Europe, such as the megaliths builders (5000-1200 BCE). The wide variety of STR markers within I2a2 could make it as much as 13,000 years old.
I2a2 is found in all Western Europe, but apparently survived better the Indo-European invasions (=> see R1b) in northern Germany, and was reintroduced by both the La Tène Celtic expansion (5th to 1st century BCE) and the Germanic invasions (3rd to 6th century CE). Nowadays, I2a2 peaks in central and northern Germany (10-20%), the Benelux (10-15%) as well as in northern Sweden. It is also found in 3 to 10% of the inhabitants of Denmark, eastern England, and northern France. It is rarer in Norway, except in the south, where the Danish influence was the strongest historically.
Distribution of haplogroup I2a2 (formerly I2b) in Europe
Haplogroup I2a2a (M223)
I2a2a (formerly I2b1) amounts to over 90% of I2a2.
I2a2a1 (M284+) occurs almost exclusively in Britain, where it seemingly developed about 3,000 years ago.
I2a2a2 (L701+) has a very wide distribution. It is found in all Central Europe from Germany and the former Austrian Empire to Poland, Romania and Ukraine, but also in lower frequencies in Greece, Italy, France, Spain, England, Ireland, and Armenia. It could have been disseminated in part by the Goths. It is conspicuously absent from Scandinavia and Scotland. L701+ matches the I2 Continental 3 clade at Family Tree DNA.
I2a2a3 (Z161+) is commonly known as the I2 Continental clade (except Continental 3). It is the largest of the four subclades of I2a2a and is found predominantly in Germanic countries, with a particularly high concentration in Denmark, Germany, the Netherlands, England and in Northwest Sicily (Norman settlement). It is also found at lower densities throughout the rest of Europe, from Portugal to Russia. I2-Z161 is thought to have been propagated around Europe by the Danish Vikings (Britain, Normandy, Sicily), the Swedish Vikings (Baltic, Russia, Ukraine), the Goths (Moldova, Balkans, Italy, south-west France, Spain), the Suebi (Portugal and Galicia), the Lombards (attested by a hotspot in Campobasso, Molise), and the Franks (Rhineland, Belgium).
I2a2a4 (L1229+) is typical of England, Normandy (and other parts of France) as well as central and northern Germany. It is also found among English surnames in Ireland, although not Norman ones (but rather Anglo-Saxon ones). Its much higher density in Germany and England than in Denmark or France, and its absence from Sicily, indicate that it is probably an Anglo-Saxon lineage rather than Norman/Viking.
Haplogroup I2a2b (L38/S154)
I2a2b (formerly I2b2) has a distribution mostly limited to Alpine Italy (esp. Piedmont), Switzerland, the German Rhineland, the Harz mountains, the Low Countries, eastern France, and the British Isles (with the exception of Cornwall, Wales, Cumbria and the Scottish Highlands).
Four out of the six samples from the 3000-year old Lichtenstein Cave in central Germany belonged to L38+. The cave was part of the Bronze Age Urnfield Culture. Based on the STR dating, it is believed that this lineage spread from Germany to England via Belgium in the Late Iron Age with the Celtic people of the La Tène Culture. I2a2b is therefore essentially a Alpine Celtic haplogroup.
The distribution of I2-L38 matches fairly well that of haplogroup R1b-U152 north of the Alps. Both haplogroups are also found at low frequency in Hungary, Romania, Bulgaria and central Turkey, probably reflecting the migration of La Tène Celts in the third century BCE (see map). R1b-U152 is associated with both the Central European Celts (Unetice, Urnfield, Hallstatt, La Tène) and the Italic people. I2-L38 being limited to the Alpine region in Italy, mostly the north-west where Gaulish tribes settled, it is likely that I2-L38 was brought to Italy by Celtic migrations many centuries after the arrival of Italic tribes from the Alpine Danube region. I2-L38 people would therefore have been autochthonous to the region between the Alps, Central Germany and the Low Countries and were assimilated into the Celtic society during the Hallstatt or La Tène period.
Haplogroup G is believed to have originated around the Middle East during the late Paleolithic, possibly as early as 30,000 years ago. At that time humans would all have been hunter-gatherers, and in most cases living in small nomadic or semi-nomadic tribes. Members of this haplogroup appear to have been closely linked to the development of early agriculture in the Levant part of the Fertile Crescent, starting 11,500 years before present. There has so far been ancient Y-DNA analysis from only four Neolithic cultures (LBK in Germany, Remedello in Italy and Cardium Pottery in Southwest France and Spain), and all sites yielded G2a individuals, which is the strongest evidence at present that farming originated with and was spread by members of haplogroup G.
So far, the only G2a people negative for subclades downstream of P15 or L149.1 were all from the South Caucasus region. The highest genetic diversity within haplogroup G is found between the Levant and the Caucasus, another good indicator of its region of origin. It is thought that early Neolithic farmers spread from the Levant westwards to Anatolia and Europe, eastwards to Mesopotamia and South Asia, and southwards to the Arabian peninsula and North and East Africa. The domestication of goats and cows first took place in the mountainous region of eastern Anatolia, including the Caucasus and Zagros. This is probably where the roots of haplogroup G2a (and perhaps of all haplogroup G) are to be found.
Distribution of haplogroup G in Europe, North Africa and the Middle East
Expansion of agriculture from the Middle East to Europe (9500-3800 BCE)
Nowadays haplogroup G is found all the way from Western Europe and Northwest Africa to Central Asia, India and East Africa, although everywhere at low frequencies (generally between 1 and 10% of the population). The only exceptions are the Caucasus region and Sardinia, where frequencies typically range from 15% to 30%.
Most Europeans belong to the G2a subclade, and most northern and western Europeans more specifically to G2a-L141.1 (or to a lower extend G2a-M406). About all G2b (L72+, formerly G2c) Europeans are Ashkenazi Jews. G2b has also been found around Afghanistan, probably as an offshoot of Neolithic farmers from the Levant.
Haplogroup G1 is found predominantly in Iran, but is also found in the Levant, among Ashkenazi Jews, and Central Asia (notably in Kazakhstan).
G2a makes up 5 to 10% of the population of Mediterranean Europe, but is fairly rare in Northern Europe. The only places where haplogroup G2 exceeds 10% of the population in Europe are Cantabria, Switzerland, the Tyrol, south-central Italy (Molise, Central and Southern Apennine), Sardinia, northern Greece (Thessaly) and Crete - all mountainous and relatively isolated regions.
Neolithic mountain herders
It has now been proven by the testing of Neolithic remains in various parts of Europe that haplogroup G2a was one of the lineages of Neolithic farmers and herders who migrated from Anatolia to Europe between 9,000 and 6,000 years ago. In this scenario the West Asian migrants would have brought with them sheep and goats, which were domesticated south of the Caucasus arbout 12,000 years ago. This would explain why haplogroup G is more common in mountainous areas, be it in Europe or in Asia.
The geographic continuity of G2a from Anatolia to Thessaly to the Italian peninsula, Sardinia, south-central France and Iberia already suggested that G2a could be connected to the Printed-Cardium Pottery culture (5000-1500 BCE). Ancient DNA tests conducted on skeletons from a LBK site in Germany (who were L30+) as well as Printed-Cardium Pottery sites from Languedoc-Roussilon in southern France and from Catalonia in Spain all confirmed that Neolithic farmers in Europe belonged primarily to haplogroup G2a. Other haplogroups found so far in Neolithic Europe include E-V13, F and I2a1 (P37.2).
Ötzi the Iceman (see famous individuals below), who lived in the Italian Alps during the Chalcolithic, belonged to haplogroup G2a2a2 (L91), a relatively rare subclade found nowadays in the Middle, southern Europe (especially Sicily, Sardinia and Corsica) and North Africa. G2a2 (PF3146) is otherwise found at low frequencies all the way from the Levant to Western Europe. Neolithic farmers in Europe would have belonged to G2a, G2a2 (+ subclades) and G2a3 (and at least the M406 subclade).
Nowadays G2a is found mostly in mountainous regions of Europe, for example, in the Apennine mountains (15 to 25%) and Sardinia (12%) in Italy, Cantabria (10%) and Asturias (8%) in northern Spain, Austria (8%), Auvergne (8%) and Provence (7%) in south-east France, Switzerland (7.5%), the mountainous parts of Bohemia (5 to 10%), Romania (6.5%) and Greece (6.5%). It may be because Caucasian farmers sought hilly terrain similar to their original homeland, perhaps well suited to the raising of goats. But it is more likely that G2a farmers escaped from Bronze-Age invaders, such as the Indo-Europeans and found shelter into the mountains. For example G2a3a (M406) is found at relatively high frequencies in the southern Balkans, the Apennines and the Alps, in contrast with G2a3b (L141.1), which is found everywhere.
G2a-L141.1, the Indo-European branch of G2a
Contrarily to other branches of G2a, which are more prevalent in mountainous areas, G2a3b (L141.1), and particularly the G2a3b1 (P303) subclade, is found uniformly throughout Europe, even in Scandinavia and Russia. More importantly, G2a3b and its subclades are also found in eastern Anatolia, the Caucasus, Central Asia and throughout India, especially among the upper castes, who represent the descendants of the Bronze Age Indo-European invaders. The combined presence of G2a3b1 across Europe and India is a very strong argument in favour of an Indo-European origin. The coalescence age of G2a3b1 also matches the time of the Indo-European expansion during the Bronze Age.
The homeland of R1b1a (P297) and Pre-Proto-Indo-European speakers is presumed to have lied in northern Anatolia and/or the North Caucasus. The Caucasus itself is a hotspot of haplogroup G. Therefore, it is entirely conceivable that a minority of Caucasian men belonging to haplogroup G (and perhaps also J2b) integrated the R1b community that crossed the Caucasus and established themselves on the northern and eastern shores of the Black Sea sometime between 7,000 and 4,500 BCE.
An alternative theory is that G2a3 (L30) came from Anatolia to eastern and central Europe during the Neolithic (a fact proven by ancient DNA test). Once in Southeast Europe it split in two branches: G2a3a, who followed the Danube to Central Europe (LBK), and G2a3b, who migrated east to the Pontic Steppe and brought agriculture to the region. G2a3b would have mixed with the indigenous R1a people, then with R1b newcomers during the Chalcolithic and Bronze Age. By the time the Proto-Indo-Europeans started their massive expansion, G2a3b men (who apparently belonged overwhelmingly to G2a3b1 and its subclades) would have joined R1b-M269/L23 in the invasion of Old Europe from 4200 BCE (=> see R1b history). G2a3a would have been among the conquered populations of Old Europe, seeking refuge in mountainous areas.
Haplogroup J2 is thought to have appeared somewhere in the Middle East towards the end of the last glaciation, between 15,000 and 22,000 years ago. Its present geographic distribution argue in favour of a Neolithic expansion from the Fertile Crescent. This expansion probably correlated with the diffusion of domesticated of cattle and goats (starting c. 8000-9000 BCE) from the Zagros mountains and northern Mesopotamia, rather than with the development of cereal agriculture in the Levant (which appears to be linked rather to haplogroups G2 and E1b1b). A second expansion of J2 could have occured with the advent of metallurgy, notably copper working (from the Lower Danube valley, central Anatolia and northern Mesopotamia), and the rise of some of the oldest civilisations.
Quite a few ancient Mediterranean and Middle Eastern civilisations flourished in territories where J2 lineages were preponderant. This is the case of the Hattians, the Hurrians, the Etruscans, the Minoans, the Greeks, the Phoenicians (and their Carthaginian offshoot), the Israelites, and to a lower extent also the Romans, the Assyrians and the Persians. All the great seafaring civilisations from the middle Bronze Age to the Iron Age were dominated by J2 men.
There is a distinct association of ancient J2 civilisations with bull worship. The oldest evidence of a cult of the bull can be traced back to Neolithic central Anatolia, notably at the sites of Çatalhöyük and Alaca Höyük. Bull depictions are omnipresent in Minoan frescos and ceramics in Crete. Bull-masked terracotta figurines and bull-horned stone altars have been found in Cyprus (dating back as far as the Neolithic, the first presumed expansion of J2 from West Asia). The Hattians, Sumerians, Babylonians, Canaaites, and Carthaginians all had bull deities (in contrast with Indo-European or East Asian religions). The sacred bull of Hinduism, Nandi, present in all temples dedicated to Shiva or Parvati, does not have an Indo-European origin, but can be traced back to Indus Valley civilisation. Minoan Crete, Hittite Anatolia, the Levant, Bactria and the Indus Valley also shared a tradition of bull leaping, the ritual of dodging the charge of a bull. It survives today in the traditional bullfighting of Andalusia in Spain and Provence in France, two regions with a high percentage of J2 lineages.
Middle-Eastern and European J2a
J2a's strong presence in Italy is owed in great part to the migration of the Etruscans from western Anatolia to central and northern Italy, and to the Greek colonisation of southern Italy. Immigration from the eastern Mediterranean to Rome during the Roman Empire, then from Anatolia, Thrace and Greece during the Byzantine period (particularly in north-eastern Italy) further increased the incidence of J2 in the peninsula.
The Phoenicians, Jews, Greeks and Romans all contributed to the presence of J2a in Iberia. The particularly strong frequency of J2a and other Near Eastern haplogroups (J1, E1b1b, T) in the south of the Iberian peninsula, suggest that the Phoenicians and the Carthaginians played a more decisive role than other peoples. This makes sense considering that they were the first to arrive, founded the greatest number of cities (including Gadir/Cadiz, Iberia's oldest city), and their settlements match almost exactly the zone where J2 is found at a higher frequency in southern Andalusia.
The Romans probably helped spread haplogroup J2 within their borders, judging from the distribution of J2 within Europe (frequency over 5%), which bears an uncanny resemblance to the borders of the Roman Empire (once concessions are made for the Germanic invasions that appear to have lowered the frequency of J2 between Belgium and Switzerland).
The highest concentrations of J2a in Europe are found in Crete (32% of the population) and Calabria (26%). J2a-M319, one of the principal J2 subclades in Greece, Italy and Western Europe, reaches is maximum frequency in Crete (6-9%).
Within India, J2a is more common among the upper castes and decreases in frequency with the caste level. This can be explained by the assimilation of local J2a (and R2) people from Bactria and Pakistan by the R1a Indo-European warriors who descended from the Volga-Ural region of Russia (Sintashta culture) and established themselves for a few centuries in southern Central Asia, immediately north of the Hindu Kush (including the Oxus civilization) before moving on to conquer the Indian subcontinent. J2a would have reached Bactria with the expansion of Neolithic herders from the Middle East who then blended with the indigenous hunter-gatherers belonging chiefly to R2.
J2b has a quite different distribution from J2a. J2b seems to have a stronger association with the Neolithic and Chalcolithic cultures of Southeast Europe. It is particularly common in the Balkans, Central Europe and Italy, which is roughly the extent of the European Copper Age culture. Its maximum frequency is achieved around Albania, Kosovo, Montenegro and Northwest Greece - the part of the Balkans which best resisted the Slavic invasions in the Early Middle Ages.
The vast majority of J2b lineages belong to J2b2 and its subclades. While J2b* and J2b1 lineages are mostly restricted to the Caucasus, Anatolia and the Balkans, J2b2 is also found in the Pontic Steppe, in Central Asia and in South Asia, particularly in India. Its very low frequency in the Middle East though suggests that, unlike other J2 lineages it was not spread by a demic diffusion of the Neolithic lifestyle.
In many ways the distribution of J2b2 and its subclades is strongly reminiscent of G2a3b1 and its subclades. The most likely hypothesis is that both haplogroups colonised the Pontic Steppe region during the Neolithic, either crossing the Caucasus from eastern Anatolia or, more probably, expanding east from the flourishing cultures of 'Old Europe' (Thessalian Neolithic). J2b2 and G2a3b1 would have integrated the local R1a population, and later been joined by a larger contingent of R1b lineages coming from the North Caucasus (see R1b history).
Nowadays J2b2 is found chiefly in Southeast and Central Europe, but also in Russia and among the upper castes of India. All these elements reinforce the hypothesis that J2b2 and G2a3b1 were two minor lineages spread within an R1a-dominant population during the Indo-Aryan invasions of South Asia approximately 3,500 years ago.
Another conceivable possibility is that a minority of J2b2, G2a3b1 and R1b-M269 from the Caucasus region migrated to the Volga-Ural region in the early Bronze Age, spreading with them the Proto-Indo-European language and bronze technology to the Caspian steppe before the expansion of this new culture to Central and South Asia. The drawback of this hypothesis is that it doesn't explain why R1b lineages strongly outnumber J2b2 and G2a3b1 in Europe but not in South Asia.
Distribution of haplogroup J2 in Europe, the Middle East & North Africa
J1 is a Middle Eastern haplogroup, which probably originated in eastern Anatolia, near Lake Van in central Kurdistan. Eastern Anatolia being the region where goats, sheep and cattle were first domesticated in the Middle East, haplogroup J1 is almost certainly linked to the expansion of pastoralist lifestyle throughout the Middle East and Europe. J1 can be divided in two main groups: the J1c3 (P58) subclade, and the other forms of J1 (J1*, J1a, J1b, J1c1 and J1c2).
J1c3 (J-P58) is by far the most widespread subclade of J1. It is a typically Semitic haplogroup, making up most of the population of the Arabian peninsula, where it accounts for approximately 40% t 75% of male lineages. J-P58 is also the Cohen Modal Haplotype. Roughly half of all Cohanim belong to the J-P58 subclade. In the Hebrew Bible the common ancestor of all Cohens is identified as Aaron, the brother of Moses.
J1c3 is thought to have expanded from eastern Anatolia to the Levant, Taurus and Zagros mountains and the Arabian peninsula at the end of the last Ice Age (12,000 years ago) with the seasonal migrations of pastoralists. Arabic speakers recolonised the Arabian peninsula in the Bronze Age from the north-west of the peninsula, close to modern Jordan. The rise of Islam in the 7th century CE played a major part in the re-expansion of J1c3 from Arabia throughout the Middle East, as well as to North Africa, and to a lower extent to Sicily and southern Spain.
Other subclades of J1 are less well studied due to their much lower frequencies. Most of the J1 in the Caucasus, Anatolia and Europe is of the non-J1c3 variety. Other types of J1 most probably spread to Europe during the Neolithic. J1 is particularly common in mountainous regions of Europe (with the notable exception of the Alps and the Carpathians), like Greece, Albania, Italy, central France, and the most rugged parts of Iberia (Asturias, Cantabria, Castile-La Mancha) as well as those with the highest density of Neolithic settlements (Portugal and Andalusia).
Like haplogroup G, J1 might have been of the principal lineages to bring domesticated animals to Europe. Both G and J1 reach their maximal frequencies in the Caucasus, some ethnic groups being almost exclusively J1 (Kubachi, Kaitak, Dargins), while others have extremely high levels of G (Shapsugs, North Ossetians). Most of the ethnic groups in the North Caucasus have between 20 and 40% of each haplogroup, which are by far their two dominant haplogroups. In the South Caucasus (Georgia, Armenia, Azerbaijan), haplogroup J2 comes into the admixture and is in fact slightly higher than either J1 or G.
Distribution of haplogroup J1 in Europe, the Middle East & North Africa
Haplogroup E1b1b (formerly E3b) represents the last major direct migration from Africa into Europe. It is believed to have first appeared in the Horn of Africa approximately 26,000 years ago and dispersed to North Africa and the Near East during the late Paleolithic and Mesolithic periods. E1b1b lineages are closely linked to the diffusion of Afroasiatic languages.
The highest genetic diversity of haplogroup E1b1b is observed in Northeast Africa, especially in Ethiopia and Somalia, which also have the monopoly of older and rarer branches like M281, V6 or V92. Ethiopians and Somalians belong mostly to the V22 and V32 (downstream of V12) subclades, but possess also a minority of M81, M123 and V42 subclades. Among the main subclades of E1b1b only V13 and V65 are absent from the Horn of Africa, and probably originated in northern Africa (V65) or the southern Levant (V13).
Haplogroup E1b1b may well have been associated with the earliest development of Neolithic lifestyle and the advent of agriculture, which is so far believed to have arisen in the Fertile Crescent, but could have developed earlier in parts of Northeast Africa now covered by the Sahara desert. Agriculture spread from the Near East to Europe, at first mostly ovicaprid and cattle herders. E1b1b men (accompanied by G2a, J and T men) appear to have been associated at least with the diffusion of Neolithic painted pottery from the Levant to the Balkans (Thessalian Neolithic), and with the Cardium Pottery culture (5000-1500 BCE) in the Western Mediterranean. The only concrete evidence for this at the moment is the presence of the E-V13 subclade, commonest in the southern Balkans today, at a 7000-year old Neolithic site in north-east Spain, which was tested by Lacan et al (2011). The African origin of some Neolithic cattle was confirmed by Decker et al (2013), who reported that Iberian and Italian cattle possess introgression from African taurine.
Distribution of haplogroup E1b1b in Europe, the Middle East and North Africa
Five major subclades of E1b1b (V12, V13, V22, M81, M123) originated in Northeast Africa before the Neolithic. Consequently most of them are present virtually in all regions where E1b1b is found. One exception is Norway, Sweden and Finland, where only E-V13 seems to be present.
The frequency of E subclades has varied geographically over time due to founder effects in Neolithic populations, i.e. the migration of a small group of settlers carrying among whom one paternal lineage was much more common than any others. Examples of founder effects include E-V12 in southern Egypt, E-V13 in the Balkans, E-V32 in Somalia, E-V65 on the Mediterranean coast of Africa, and E-M81 in Northwest Africa.
E1b1b1a1 (or E-M78, formerly E3b1a) is the most common variety of haplogroup E among Europeans and Near Easterners. E-M78 is divided into 4 main branches : E1b1b1a1 (E-V12), E1b1b1a2 (E-V13), E1b1b1a3 (E-V22) and E1b1b1a4 (E-V65), each subdivided in further subclades.
E-V13 is one of the major markers of the Neolithic diffusion of farming from the Levant. Like all the other subclades of E-M78, E-V13 originated in north-east Africa toward the end of the last Ice Age. Its frequency is now far higher in Greece, South Italy and the Balkans than anywhere else due to a founder effect among the Neolithic colonisers from the southern Levant. Archeological evidence shows that the region of Thessaly, in northern Greece, was the starting point (circa 6,000 BCE) for the diffusion of agriculture through the Balkans and the Danube basin, which spread as far as west as northern France, and as far east as southwestern Russia. The modern distribution of E-V13 hints at a strong correlation with the Neolithic and Chalcolithic cultures of Old Europe, such as the Vinča, Boian (aka Giuleşti-Marişa), and Karanovo, cultures. The genetic testing of three male samples from the LBK culture only revealed the presence of haplogroups F and G2a. The sample size was nevertheless too small to rule out that E1b1b was part of this culture.
E-V13 was later associated with the ancient Greek expansion and colonisation. Outside of the Balkans and Central Europe, it is particularly common in southern Italy, Cyprus and southern France, all part of the Classcical ancient Greek world.
E-V22 is found primarily in western Ethiopia, northern Egypt and in the southern Levant. In Europe it is therefore associated with the Phoenicians and the Jews, in addition to the propagation of agriculture. The Phoenicians could have disseminated E-V22 to Sicily, Sardinia, southern Spain and the Maghreb, and the Jews to Greece and mainland Italy and Spain. However, the Mediterranean route for the diffusion of agriculture (see map above) went through the exact same regions. It is therefore impossible to know at present which of the two periods (Neolithic or Classical Antiquity) played the stronger role in the spread of V22 around the Mediterranean.
E-V12 is the most common subclade of M78 in southern Egypt (over 40% of the population), while its V32 subclade is the dominant paterneal lineage in Somalia, southern Ethiopia and northern Kenya. The low presence of V12* in the Near East and across Europe (except Nordic countries) indicates that it was a minor Neolithic lineage accompanying E-V13. V32 has not been found outside Northeast Africa.
E-V65 is found chiefly in North Africa, with a maximum frequency (20-30%) observed in Lybia, Tunisia and northern Morocco. The absence of V65 from the Horn of Africa means that it would have originated in North Africa. V65 has also been found at lower frequencies (0.5% to 5%) in Egypt, Greece, southern Italy, Sicily, and more interestingly among the Sardinians and the Basques, two population isolates with strong affinities with the Neolithic population of Europe. However, V65 has not been found in the Levant, the Balkans or in non-Mediterranean Europe, which disproves a Neolithic dispersal. Its strongly North African distribution and very minor presence in parts of southern Europe with historical links to North Africa would rather suggest that this lineage was brought to southern Europe by immigrants from North Africa. In the case of Italy this could have taken place any time from the Phoenician/Carthagian period to the Vandal Kingdom of North Africa. In Greece, V65 could have come from the ancient colonies of Cyrenaica. In Iberia, V65 could have crossed the Strait of Gibraltar any time since the late Paleolithic.
E1b1b1a2 (E-M81, formerly E3b1b) is characteristic of the Berbers of Northwest Africa. In some parts of Morocco E1b1b1b can peak at 80% of the population. M81 is also found in Iberia, Italy and southern France, with the highest concentrations in southern Portugal (10%) and decreasing as we move north. One remarkable exception are the Pasiegos of Cantabria, who were found to carry 30% of E-M81, although from a small sample size (n=101).
E1b1b1a3 (E-M123) and its main branch E1b1b1c1 (E-M34) has a distribution strongly reminiscent of the diffusion of agriculture. It is most common in Ethiopia (5-10%), in the southern Levant (10-12% among the Palestinians and the Jews, 8% among the Bedouins, 5% in Lebanon), in North Africa (3-5%), in Anatolia (3-5%), and in Italy (1 to 8%). M123 appears to have originated in Ethiopia, then expanded as M34 from the Levant in all directions over the Middle East, North Africa, South Asia and southern Europe. The distribution of E-M123 matches almost exactly the early expansion of farming (see map above) during the Neolithic period. The frequencies of E-M123 seem to go hand in hand with those of haplogroup G2a, with the difference that G2a reaches its maximum frequency around the Caucasus and Anatolia, perhaps due to a founder effect. Within Europe, E-M123 follows more or less the distribution of E-V13, with the highest frequency (1 to 5%) observed in Greece, South Italy, the Balkans and the Danube basin, then fading towards Germany, Poland, Ukraine and Russia, where its frequency is under 1%.
Haplogroup T is a fairly rare lineage in Europe. It makes up only 1% of the population on most of the continent, except in Greece, Macedonia and Italy where it exceeds 4%, and in Iberia where it reaches 2.5%, peaking at 10% in Cadiz and over 15% in Ibiza. The maximal worldwide frequency for haplogroup T is observed in East Africa (Eritrea, Ethiopia, Somalia, Kenya, Tanzania) and in the Middle East (especially the South Caucasus, southern Iraq, south-west Iran, Oman and southern Egypt), where it accounts for approximately 5 to 15% of the male lineages. Besides these regions and Europe, T is found in isolated pockets as far as Central Asia, India, Cameroon, Zambia and South Africa. Its highest density is actually found among the Fulani people of Cameroon (18% of the population).
Distribution of haplogroup T in Europe, the Middle East and North Africa
Origins & History
Haplogroup T originated at least 30,000 years ago, making it one of the oldest haplogroups found in Eurasia, which may explain its vast dispersal around Africa and South Asia. It also makes its place of origin uncertain. T is descended from haplogroup K, the ancestor of most of the Eurasian haplogroups (L, N, O, P, Q, R and T), and whose origins are thought to lie in the Middle ast or in Central Asia.
Although haplogroup T is more common today in East Africa than anywhere else, it almost certainly spread from the Fertile Crescent with the rise of agriculture. Indeed, the oldest subclades and the greatest diversity of T is found in the Middle East, especially around the Fertile Crescent. The higher frequency of T in East Africa would be due to a founder effect among Neolithic farmers from the Middle East.
The modern distribution T in Europe strongly correlates with a the Neolithic colonisation of Mediterranean Europe by Near-Eastern farmers, notably the Cardium Pottery culture (5000-1500 BCE).
During the Chalcolithic and Bronze Age haplogroup T would have been an important (though probably not dominant) lineage among ancient peoples such as Sumerians and the Elamites.
The higher than average frequencies of haplogroup T in places like Cyprus, Sicily, Tunisia, Ibiza, Andalusia and the northern tip of Morocco suggest that haplogroup T could have been spread around the Mediterranean by the Phoenicians (1200-800 BCE), and that ancient Phoenicia seemingly had a higher incidence of T than Lebanon does today (5%).
While almost all subclades of T are found in the Middle East, most Europeans outside the Mediterranean belong to the subclades T1a2 (L131) and T1a1a1a (P77), who are also found in Anatolia. These subclades probably represent one of the Neolithic migration from the Fertile Crescent to south-east Europe. They would then have spread around central and eastern Europe, as far north as the eastern Baltic. T1a2 has been found as far east as the Volga-Ural region of Russia and Xinjiang in north-west China. This branch probably penetrated into the Pontic-Caspian Steppe during the Neolithic (perhaps alongside G2a3b1 and J2b2) and became integrated to the indigenous R1a peoples before their expansion to Central Asia during the Bronze Age (=> see R1a-Z93).
Haplogroup T has been found at a relatively high frequency among the Tatars (5%) and Maris (2%) of the Volga-Ural region as well as in north-west Russia (3%) and Estonia (3.5%) suggesting that it may have been one of the principal lineages bringing the Neolithic to Uralic-speaking population. Autosomal DNA tests have also identified unusually high percentages of Southwest Asian admixtures among the Finns (1 to 2.5%) and Lithuanians (1.5%), who otherwise lack West Asian or Caucasian admixture and possess hardly any Middle Eastern Y-DNA. This Southwest Asian admixture could be the trace of T lineages absorbed during the Neolithic.
N is found among Uralic speakers, from Finland to Siberia, and at minor frequencies as far as Korea and Japan. In Europe, haplogroup N is only found at high frequencies among modern Finns (58%), Lithuanians (42%), Latvians (38%), Estonians (34%) and northern Russians.
Haplogroup N is believed to have originated in Southeast Asia approximately 15,000 to 20,000 years ago, but the N1c1 subclade found in Europe likely arose in Southern Siberia circa 12,000 years ago, and spread to North-East Europe 10,000 years ago.
Haplogroup N1c1 is associated with the Kunda culture (8000-5000 BCE) and the Comb Ceramic culture (4200-2000 BCE), which evolved into Finnic and pre-Baltic people.The Indo-European Corded Ware culture (3200-1800 BCE) progressively took over the Baltic region and southern Finland from 2,500 BCE. The merger of the two gave rise to the hybrid Kiukainen culture (2300-1500 BCE). Modern Baltic people have a roughly equal proportion of haplogroup N1c1 and R1a, resulting from this merger of Uralic and Slavic cultures.
Haplogroup Q is found predominantly in Central Siberia, Central Asia and among Native Americans. Approximately 90% of pre-Columbian Native Americans belonged to haplogroup Q, and all descend from the branch Q1a2a1 (L54), including various subclades of Q1a2a1a1 (M3) and Q1a2a1a2 (Z780).
In Europe, haplogroup Q1a is believed to have been brought by the Huns, the Mongols and the Turks, who all originated in the Altai region, around modern Mongolia. Haplogroup Q has been identified in Iron Age remains from Hunnic sites in Mongolia by Petkovski et al. (2006) and in Xinjiang by Kang et al. (2013). Modern Mongols belong to various subclades of Q1a, including by order of frequency Q1a2a1c (L330), Q1a1a1 (M120), Q1a1b (M25) and Q1a2a* (L53).
Distribution of haplogroup Q in Europe
The Huns in Sweden ?
Götaland and Gotland in southern Sweden now have the highest frequency of haplogroup Q in Europe (5%) and almost all of it belong to the Q1a2b1 (L527) subclade. The Romans reported that the Huns consisted of a small ruling elite and their armies comprised mostly of Germanic warriors. Gotland and Götaland is the presumed homeland of the ancient Goths. In the 1st century CE, some Goths migrated from Sweden to Poland, then in the 2nd century settled on the northern shores of the Black Sea around modern Moldova. The Huns conquered the Goths in the Pontic Steppe in the 4th century, forcing some of them to flee the Dnieper region and settled in the Eastern Roman Empire (Balkans). It would not be improbable that some Goths and Huns moved back to southern Sweden, either before invading the Roman Empire, or after the fall of the Western Roman Empire, displaced by the Slavic migrations to Central Europe. After all, even ancient people kept the nostalgia of their ancestral homeland and knew exactly where their ancestors a few hundreds years earlier came from.
While Q1a is more Mongolian, Siberian and Native American, Q1b appears to have originated in Central Asia and migrated early to South Asia and the Middle East. The highest frequency of Q1b in Europe is found among Ashkenazi Jews (5%) and Sephardic Jews (2%), suggesting that Q1b was present in the Levant before the Jewish disapora 2,000 years ago. Q1b is also found in Lebanon (2%), and in isolated places settled by the Phoenicians in southern Europe (Crete, Sicily, south-west Iberia). This means that Q1b must have been present in the Levant at latest around 1200 BCE, a very long time before the Hunnic migrations. One hypothesis is that Q1b reached the Middle East alongside haplogroup R1a-Z93 with the Indo-Iranian migrations from Central Asia during the Late Bronze Age.
Haplogroup Q can be divided in the following subclades:
Haplogroup C (Y-DNA)
Haplogroup C is an extremely old lineage thought to have appear before or soon after the first migration of Homo Sapiens outside Africa, some 70,000 years ago. Men belonging to haplogroup C would have departed from East Africa during the Ice Age and followed the coasts of Indian Ocean, settling in the Arabian peninsula, the Indian subcontinent, south-east Asia, north-east Asia and Oceania.
The first group to split away was C-Z1426, which colonised the Middle East and South Asia. One branch (CTS11043) might have moved north to Central Asia, then split into two: one tribe moving west to Europe (haplogroup C-V20) while the other migrated to East Asia and survives only in Japan today (haplogroup C-M8). Haplogroup C-V20 probably represents the first migration of Homo Sapiens to Europe 45,000 years ago, and would therefore have been the first to come into contact with Neanderthals.
The second branch of C-Z1426 spread around South Asia, Southwest Asia, and Central Asia, where it is found at low frequencies nowadays (haplogroup C-M356).
During that time, other C tribes continued their eastward migration to south-east Asia, where they split in four main regional clusters. The first branch colonised Indonesia, Melanesia, Micronesia, and Polynesia (haplogroup C2-M38). A second branch would have gone south to Australia, where they became the Aborigenes (haplogroup C4-M347). Another settled in the highlands of New Guinea (haplogroup C-P55). The fourth branch went all the way up the north-east Asia (haplogroup C3-M217) and is found nowadays chiefly among the Mongols, tribes descended from the Mongols (Kalmyks, Hazaras) including Turkic people (Kazakhs, Kyrgyz, Uyghurs, Uzbeks, Tuvans, Yakuts), East Siberian tribes (Buryats, Chukchi, Itelmens, Nivkh, Tungusic peoples), Chinese (Han, Hui, Manchus, Oroqens, Tujia), Koreans and Japanese (especially the Ainus), but also among several indigenous peoples of North America, including some Na-Dené-, Algonquian-, or Siouan-speaking populations.
Haplogroup C is a very rare lineage in Europe. The few Europeans who belong C either belong to the European C-V20, the Middle Eastern C-M358, or the Mongolian C3-M217. Haplogroup C3 has also been identified in one Hunnic skeleton from the Iron Age in present-day Mongolia. Its presence in Europe can therefore be linked to the Hunnic and Mongolian invasions, like haplogroup Q1a.
Haplogroup L (Y-DNA)
Haplogroup L is found mostly in West Asia and South Asia. Its overall frequency ranges between 5 and 15% in Pakistan and western India, with a peak of 23% among the Kalash of northwest Pakistan, and from 1 to 10% in central Asia (mostly in Uzbekistan, Tajikistan and Afghanistan). It is also found in the Middle East (5% in Lebanon, 4.5% in Turkish Kurdistan, 4% in Iran, 3% in Syria), in parts of the the Caucasus (7% in Azerbaijan and Chechnya, 3% in Armenia and Ingushetia), and in isolated parts of Europe (3.5% in north-east Italy, from 0.2% to 1% in the Balkans and Greece, 0.5% in Flanders).
Haplogroup L is divided in four main subclades:
L1a (M27) is the mostly found in India and Sri Lanka, with frequencies decreasing towards Pakistan, southern Iran, the Arabian peninsula. It has also been found in Piedmont (Italy), Rhineland (Germany) and Flanders (Belgium).
L1b (M317) is found chiefly in the South Caucasus, eastern Anatolia and Lebanon. It has also been found in South Tyrol, Russia and Central Asia. Its main subclade L1b1 (M349) has been found in Italy, Switzerland, Austria, Germany, Belgium, England, northern Ireland, and scattered around most of central and eastern Europe and the eastern Mediterranean. The presence of L1b and L1b1 in Europe probably dates back to the Neolithic period.
L1c (M357) is an essentially Gedrosian subclade, found among the Burushos, Kalashs (L1c1-PK3 subclade), and Pashtuns of Pakistan and Afghanistan, but also among the Chechens in the north-east Caucasus. It is also found at low frequencies in other populations of Pakistan, in India, northern Iran, Georgia and Ingushetia. In Europe it has been found in Sicily.
At present L2 (L595) has been found exclusively in Europe (Greece, Italy, southern Germany, Russia) and in the South Caucasus.
Haplogroup H (Y-DNA)
Haplogroup H is typically found among Dravidian populations in the Indian subcontinent, especially in South India and Sri Lanka. In Europe it is found almost exclusively among the Gypsies (Romani), who belong predominantly (between 15% and 50%) to the H1a (M82) subclade of Indian origin. The highest frequencies of haplogroup H among non-Romani Europeans are found in regions with large Romani populations, such as Romania, Slovakia, the southern Balkans, and Andalusia, suggesting that these lineages are also of Romani origin. No other subclade than H1a has been found to date in Europe.
Haplogroup A (Y-DNA)
A is the oldest of all Y-DNA haplogroups. It originated in sub-Saharan Africa over 140,000 years ago, and possibly as much as 340,000 years ago if we include haplogroup A00. Modern populations with the highest percentages of haplogroup A are the Khoisan (such as the Bushmen) and the southern Sudanese.
There are only rare and isolated cases of European men belonging to haplogroup A. Commercial tests have identified a few Scottish and Irish families (surnames Boyd, Logan and Taylor) all belonging to the same A1b1b2 (M13) subclade. This subclade is normally found in East Africa (Ethiopia, Sudan), but has also been found in Egypt, the Arabian peninsula, Palestine, Jordan, Turkey, Sicily, Sardinia and Algeria. It was certainly brought to Europe by Levantine people, be it during the Neolithic or later (Phoenicians, Jews, immigration within the Roman Empire).
Haplogroup A1a* (M31) has been found in Finland, Norway and eastern England. This subclade is normally found along the west coast of Africa (Guinea-Bissau, Cape Verde, Mali, Morocco) and could have come to Europe during the Paleolithic. Indeed a few percent of sub-Saharan admixture was found among ancient DNA samples from Mesolithic Scandinavia tested by Skoglund et al. (2012).
All mtDNA haplogroups found in Europe descend from the N group, which is thought to represent one of the two initial migrations by modern humans out of Africa, some 60,000 to 80,000 years ago. Nowadays haplogroup N is only found at extremely low frequencies in various parts of Eurasia.
Unfortunately, the tiny size of mitochondrial DNA (approximately 16,500 base pairs as opposed to 60 million for Y-DNA) does not allow a very accurate tracing of ancestry. Mitochondrial haplogroups all arose during the Ice Age, a period when humans were nomadic hunter-gatherers, well before the establishment of cities and civilizations. Mitochondrial haplogroups are only linked to ethnicities at a continental level. Those associated with European descent are H, I, J, K, T, U, V, W and X (except the branch X2a which found among Native Americans). Deep subclades can be associated with more specific regions, but do not necessarily match historical ethnic and linguistic groups. One likely reason is that women, through whom mtDNA is passed, tended to marry outside their ethnic group more often than men (e.g. to secure an alliance between two tribes or kingdoms).
Chronological development of mtDNA haplogroups
Note that the age of mitochondrial haplogroups is much more difficult to estimate than Y-DNA haplogroups, due to the tiny sequence of mtDNA and the few number of mutations available. The error margin for the dates below is typically of +-5,000 years, but could even exceed that for older haplogroups.
N => 75,000 years ago (arose in North-East Africa)
R => 70,000 years ago (in South-West Asia)
U => 60,000 years ago (in North-East Africa or South-West Asia)
pre-JT => 55,000 years ago (in the Middle East)
JT => 50,000 years ago (in the Middle East)
U5 => 50,000 years ago (in Western Asia)
U6 => 50,000 years ago (in North Africa)
U8 => 50,000 years ago (in Western Asia)
pre-HV => 50,000 years ago (in the Near East)
J => 45,000 years ago (in the Near East or Caucasus)
HV => 40,000 years ago (in the Near East)
H => over 35,000 years ago (in the Near East or Southern Europe)
X => over 30,000 years ago (in north-east Europe)
U5a1 => 30,000 years ago (in Europe)
I => 30,000 years ago (Caucasus or north-east Europe)
J1a => 27,000 years ago (in the Near East)
W => 25,000 years ago (in north-east Europe or north-west Asia)
U4 => 25,000 years ago (in Central Asia)
J1b => 23,000 years ago (in the Near East)
T => 17,000 years ago (in Mesopotamia)
K => 16,000 years ago (in the Near East)
V => 15,000 years ago (arose in Iberia and moved to Scandinavia)
H1b => 13,000 years ago (in Europe)
K1 => 12,000 years ago (in the Near East)
H3 => 10,000 years ago (in Western Europe)
Mitochondrial DNA of prehistoric Europeans
The testing of ancient DNA helped understand how long each haplogroup has been in Europe. Only a few such tests have been successfully conducted so far. Mitochondrial DNA was extracted from the skeleton of a 28,000 year-old Cro-Magnon from southern Italy, and the haplogroup was determined as HV or pre-HV. Still preceding the Neolithic expansion from the Middle East, the 9,000 year-old Cheddar Man was found to belong to haplogroup U5a. (=> More examples of ancient mtDNA haplogroups).
Autochtonous (Cro-Magnoid) Europeans must have therefore belonged at least to haplogroups HV (and its offspring H and V) as well as U5a, which also happen to be the most common mitochndrial haplogroup everywhere in Europe. It has been speculated that over half of the matrilineal lineages in Europe descend directly from Paleolithic Europeans. Their male counterpart are Y-DNA haplogroup I.
Haplogroup H is by far the most common all over Europe, amounting to about 40% of the European population. It is also found (though in lower frequencies) in North Africa, the Middle East, Central Asia, Northern Asia, as well as along the East coast of Africa as far as Madagascar.
H1, H3 and V are the most common subclades of HV in Western Europe. H1 peaks in Norway (30% of the population) and Iberia (18 to 25%), and is also high among the Sardinians, Finns and Estonians (16%), as well as Western and Central European in general (10 to 12%) and North-West Africans (10 to 20%). H3 is commonest in Portugal (12%), Sardinia (11%), Galicia (10%), the Basque country (10%), Ireland (6%), Norway (6%), Hungary (6%) and southwestern France (5%). Haplogroup V reaches its highest frequency in northern Scandinavia (40% of the Sami), northern Spain, the Netherlands (8%), Sardinia, the Croatian islands and the Maghreb. It is likely that H1, H3 and V, along with haplogroup U5, were the main haplogroups of Western European hunter-gatherers living in the Franco-Cantabrian refuge during the last Ice Age, and repopulated much of Central and Northern Europe from 15,000 years ago.
Haplogroup H13 is most common in Sardinia and around the Caucasus. Its distribution is reminiscent of Y-DNA haplogroup G2a. The same is true of H2 to a lower extent. This would suggest a Caucasian or Anatolian origin.
H5 and H7 are also common in the Caucasus, but their lower incidence around the Mediterranean, and higher frequency from Anatolia to the Alps via the Danube suggest a possible link with the spread of agriculture (YDNA E1b1b, J2 and T) or of the Indo-Europeans (R1b1b2).
Haplogroup U & K (mtDNA)
Haplogroup U is extremely old. It originated some 60,000 years ago at the confine of North-East Africa and the Middle East, soon after the first Homo Sapiens ventured out of Africa. This is why each of its top-level subclade (U1, U2, U3...) can be seen as a haplogroup in its own right. The main European subclades are U3, U4, U5 and U8/K. U1 is mostly found in the Middle East, U6 in North Africa, U7 from the Near East to India, and the rare U9 from Ethiopia and the Arabian peninsula to Pakistan.
Haplogroup U2 is found primarily in South Asia, but probably is of Indo-European origin as it is found at low frequencies throughout the Pontic-Caspian steppe and has been identified in a 30,000 year-old Cro-Magnon from the middle Don valley in Russia. It might have been the dominant haplogroup of the northern forest-steppe foragers who later became the Proto-Indo-Iranian speakers (see R1a above) and moved massively to Central and South Asia.
Haplogroup U3 is centered around the Black Sea, with a particularly strong concentration in the north-eastern part. It could be related to the ancient Indo-Europeans, and probably more to R1b than R1a.
Haplogroup U4 are more common in Eastern Europe, Central Asia, northern South Asia (around Tajikistan for U4, and Pakistan for W), which also suggests an affiliation with the Indo-Europeans (correlated to Y-DNA haplogroup R1a). The same is true of haplogroups I, W, T2 and U2e to a lower extent.
Haplogroup U5 is the most common in Western and Northern Europe. DNA tests on ancient skeletons have shown that U5 was the principal mitochondrial haplogroup of Paleolithic and Mesolithic hunter-gatherers in Northern Europe. Ancient DNA tests conducted in Britain, Germany and Scandinavia indicate that the frequency of U5 has progressively declined over time through the Neolithic, Bronze Age, Iron Age and Middle Ages. Nowadays it remains most common in the far north of Europe, where the Mesolithic population has been least affected by subsequent migrations. For instance, 30 to 50% of the Sami people of northern Scandinavia belong to haplogroup U5b (and about 40% to haplogroup V, which is also pre-Neolithic European origin).
Haplogroup K is the main subclade of U8. It is found throughout Europe and Western Asia, as far away as India. Its highest concentration is in North-West and Central Europe, Anatolia and the southern Arabian peninsula. It is believed to have first arisen somewhere between Egypt and Anatolia approximately 16,000 years ago (estimates range from 22,000 years to as little as 10,000 years before present). It has the largest number of subclades of any haplogroup in spite of its fairly recent age. K1a is the largest subclade. The relatively important presence of K1a in the Near East suggest that it predates the Neolithic migration to Europe. This has been supported by the ancient mtDNA from Neolithic sites. Haplogroup K was never found in Europe prior to to the Neolithic, then suddenly appears at a frequency (17%) much higher than in modern Europeans and similar to that of the present-day Levant. Most of the Neolithic K belongs to the K1a subclade.
Most K1a4, K1a10, K1b, K1c and K2 subclades are typically European. K1a4 is also common in Anatolia and Greece, and could indeed have spread to the rest of Europe from there during the Neolithic period, along with haplogroups J and T (and Y-DNA haplogroups E1b1b, J2 and T). The Indo-Europeans from Anatolia could also have contributed to the propagation of K. K1a1b1a and K1a9 are found primarily among Ashkenazi Jews.
Haplogroup J & T (mtDNA)
Haplogroup J originated in the Middle East 45,000 years, making it one of the oldest mitochondiral haplogroups in Europe and the Middle East. It is usually associated with the spread of agriculture. Haplogroup J being so common in Central Asia and around the Caspian and Black Sea, it is likely to have also a connection with the Indo-Europeans, especially the migration of Y-DNA haplogroup R1b (see R1b history above). J1 is common throughout the Middle East, as far as Central Asia and around Ukraine. In the rest of Europe it is mostly confined to Germanic countries (mimicking the distribution of Y-DNA haplogroup I1). J2 is much rarer than J1. J2a is found homogeneously across most of Europe. J2b is more frequent around Anatolia and in South-East Europe.
Haplogroup T is thought to have originated in the Middle East or North-East Africa at least 12,000 years ago. It is found throughout Europe, the northern half of Africa to Central Asia and Siberia, with pockets in India and North-West China (Xinjiang). The highest concentration of T1 has been observed in North-East Africa, Anatolia and Bulgaria, which suggests a Neolithic diffusion from Egypt to the Balkans. T2, the most subclade of T in Europe, is particularly common in North-East Europe and around the Aegean Sea. The overall distribution of haplogroup T points at an early Neolithic migration from North-East Africa to Eastern Europe, then a dispersal following the migration pattern of the Indo-Europeans (especially Y-DNA haplogroup R1a) to Europe and South Asia.
Haplogroup W (mtDNA)
Present at low frequencies in most of Europe, in Anatolia, around the Caspian Sea, and from the Indo-Pakistani border to Xinjiang, haplogroup W is one of the best maternal markers of Indo-European ancestry (mtDNA equivalent of R1a and R1b). Its highest frequency is in Ukraine, European Russia, Baltic countries and Finland (3 to 5% overall), as well as in northern Pakistan (15%), Punjab (9%) and Gujarat (12%). In Indian it is considerably more common among the upper castes and among Indo-European speakers (source).
Haplogroup I (mtDNA)
Haplogroup I has a similar distribution to haplogroup W, ranging from Europe to Pakistan and North-West India, with a characteristic presence in Pontic steppes and around the Caspian Sea. Its origin very probably lies in the Proto-Indo-European cultures (mtDNA mirror of R1a and R1b). Haplogroup I is nearly absent in parts of Europe from distant from the Pontic-Caspian steppes (Iberia, South-West France, Ireland) and strongest in Norway, southern Finland, Ukraine, Greece and western Anatolia.
Haplogroup X (mtDNA)
Haplogroup X is a very old and scattered haplogroup found all over Eurasia, North Africa as well as among Native North Americans. It frequency rarely exceeds 5% of the population in any ethnic group, and is more often restricted to 1 or 2%. X1 is found almost exclusively in North Africa, while X2b is the only lineage present among Amerindians. X2a, X2c, X2d and X2e are found in Europe, Siberia and Central Asia. It is therefore possible that the latter be of Indo-European origin (R1b1b).
The strong presence of X2 around the Caucasus, progressively fading towards the Near East and Mediterranean , hints that it could be related to the spread of Y-DNA haplogroup G2a. R1b1b and G2a both having origins around the Caucasus it is unsurprising to find X2 alongside these two Y-DNA haplogroups.
Haplogroup R is the main subclade of N, the one that was to generate the 6 most common European haplogroups (H, V, J, T, U, K). At the time of writing R subclades were numbered from R0 (a.k.a. pre-HV) to R31. Most of them are found in South Asia (R5, R6, R7, R8, R30, R31), Southeast Asia (R9, R21, R22, R24), East Asia (R9/F, R11/B), and even among Papuans (R14) and Australian aborigenes (R12). R0a peaks in the southern Arabian peninsula is common among Arabs and Middle-Easterners. R1a (not to be confused with the homonymous Y-chromosome haplogroup) is found among the Adygei people from the North Caucasus (related to the Maykop culture => see R1b section), Brahmins from northern India, northwestern Russians and Poles - basically all people closely related with the Indo-European expansion. R2 is found from northwest India and Pakistan to Iran, Georgia and Turkey. It could be connected to the Indo-Iranians.
Finno-Uralic people have an overall mtDNA admixture similar to other Europeans, with a higher percentage of W and U5b, and a small percentage of Siberian haplogroups such as N or A. The Sami are characterised by a high percentage of haplogroups U5b1 and V.
The Berbers are the indigenous populationof north-west Africa. Although their Y-DNA is almost perfectly homogenous, belonging to haplogroup E-M81, Berber maternal lineages show a much greater diversity, as well as regional disparity. At least half (and up to 90% in some regions) of the Berbers belong to some Eurasian lineages, such as H, HV, R0, J, T, U, K, N1, N2, and X2, mostly of Middle or Near Eastern origin. 5 to 45% of the Berbers will have sub-Saharan mtDNA (L0, L1, L2, L3, L4, L5). There are only three native North African lineages, U6, X1 and M1, representing 0 to 35% of the people depending on the region.
Haplogroup U6 has been observed from the Iberia and the Canary Islands to Senegal in the West, and from Syria to Ethiopia and Kenya in the East. It is also found at low density in Europe, though mostly limited to Iberia. Approximately 10% of all North Africans belong to this lineage.
The Gypsies (Romani people) originated in the Indian subcontinent and mixed with local population in the Middle East and Eastern Europe over the centuries. About half of the Gypsy population belong to haplogroup M, and more specifically M5 (reflected by Y-haplogroup H1a), which is otherwise exclusive to South Asia. The other mtDNA haplogroups found among the Gypsy community are mostly of Eastern European, Caucasian or Middle Eastern origin, such as H (H1, H2, H5, H9, H11, H20, among others), J (J1b, J1d, J2b), T, U3, U5b, I, W et X (X1b1, X2a1, X2f) (sources). The same diversity exist on the Y-DNA side (45% of H1a, followed by I1, I2a, J2a4b, E1b1b, R1b1b, R1a1a).
The list below is non-exhaustive and include many of the numerous references linked on these websites. Some studies and databases not published on the Web were also used.