Eupedia Genetics

Eupedia Home > Genetics > Haplogroups (home) > Haplogroup I2

Major Y-chromosomal haplogroups are defined by gene-altering polymorphisms affecting fertility and reproductory success

Author: Maciamo Hay (Written on 1st October 2013. Last updated on 29 March 2017)

Y chromosome


The Y chromosome represents 2% of the human genome in men and its role is generally recognised as being limited mostly to the development of male characterstics, be them physical or behavioural. The sex-determining region of the Y chromosome lies in the SRY gene, Y, which triggers testis development. Several other genes have recognised roles in male fertility, such as sperm production. As of late 2016, over 200 Y-chromosomal genes have been identified, although the exact function of many of them has yet to be determined. Y-chromosomal mutations have been associated with several conditions besides male fertility issues, including deafness (DFNY1 gene), short stature (SHOX gene), Leri-Weill Dyschondrosteosis (abnormal shortening of the forearms and lower legs, LWD gene), Langer mesomelic dysplasia (disorder of bone growth, LMD gene), and retinitis pigmentosa (RPY gene).

Although the study of the human genome is still in its infancy, variations in Y-chromosomal DNA have been reported to influence male behaviour, health risk and immunity. For example, Shoaib Shah et al. (2008) found that Pakistani men belonging to haplogroups R1a1 and R2 had lower self-reported aggression. Wang et al. 2011 reported that Y-haplogroup R1a was associated with a higher risk for prostate cancer. Y-haplogroup I has been linked to substantially increased risk of coronary artery disease due to a downregulation of UTY and PRKY genes in macrophages (Charchar et al. 2012 and Bloomer et al. 2013), while additionally Sezgin et al. (2009) found that members of Y-DNA haplogroup I were more likely to suffer from rapid HIV progression once contaminated. It has also been reported that older men and smokers possess cells that lack a Y chromosome, and that such men have a higher risk of certain cancers and have a shorter life expectancy (Dumanski et al. 2015).

In 2010, Jennifer F. Hughes and her colleagues disclosed in Nature that the chimpanzee and human Y chromosomes were remarkably divergent in structure and gene content. With a 30% difference between them, the Y chromosome appears to be one of the fastest-evolving parts of the human genome. What is driving this evolution? Male competition for procreation is the first thing that springs to mind. However, humans are relatively monogamous compared to the extremely promiscuous and non-pair bonding chimpanzees. That is why natural selection of Y-chromosomal genes may not necessarily be limited to increased testes size or sperm production, but may also involve physical or behavioural adaptations, such as greater physical strength, heightened aggresiveness or dominance, or even greater inventiveness.

The fast evolution of the Y chromosome driven by beneficial mutations within Y-DNA genes

The evolution of the human Y chromosome can be traced using SNPs accumulated from generation to generation. As the Y chromosome does not recombine with the X chromosome, men inherited exactly the same Y chromosome as their fathers, with the addition of a few de novo mutations. These accumulated mutations, or SNPs, make it possible to classify all human Y chromosome in a grand evolutionary tree. Regional distribution patterns emerged, which allowed population geneticists to define haplogroups, i.e. the main branches of shared ancestry of various human populations on Earth. Although haplogroup determination was at first rather arbitrary and based essentially on geography, it emerged over the years with more detailed testing techniques that in fact most of these haplogroups defined by hundreds of SNPs, which were typically the result of population bottlenecks endured during the Last Glacial Maximum, or some other climatologic, historical event, including later epidemics. But that is not all. What I am going to demonstrate here is that some of the most successful haplogroups in the last 10,000 years are apparently defined by mutations in the codding region of Y-chromosomal genes, as opposed to random mutations lying in (non-coding) introns between genes.

Since protein-coding genes represent only about 1% of the genome (whatever the chromosome), these mutations are far rarer. As genes have been designed by millions of years of evolution, changes in the gene structure are often deleterious and can also wreak havoc the body, breaking genes and causing diseases and disabilities. In a minority of cases, beneficial mutations arise that will increase fitness or some specific functions or capabilities. In the case of Y-chromosomal mutations, this could be increased fertility, changes in sexual behaviour, more masculine physical traits, heightened dominance, or any of the other behavioural traits that differentiate men from women, which could have given them an edge over other lineages. As they confer an evolutionary advantage these beneficial mutations will be selected and spread quickly within a population.

At present, over 15,000 SNPs define human Y-chromosomal haplogroups and their subclades. Yet, among them, only 15 mutations have been identified in the coding region of Y-chromosomal genes. If such mutations have an effect on male attributes or fertility, and only beneficial mutations were allowed to survive by natural selection, then one should expect that these mutations became extremely common over time and would therefore define major haplogroups, as opposed to minor subclades that failed to thrive.

SRY mutations

One of the most important genes on the Y chromosome is SRY (sex-determining region). In rare cases when the SRY gene is translocated to the X chromosome, it causes the XX male syndrome (a person with two X chromosomes who should be a woman but is actually a man). Likewise, XY individuals with defects or deletions in the SRY gene end up with female characteristics (Swyer syndrome). Mutations in that SRY genes are therefore bound to have serious effects on the carrier. It is perhaps not a coincidence then that several of the most successful Y-chromosomal haplogroups are defined by a SRY mutation. This is the case for:

  • Haplogroup BT (SRY1532.1, aka SRY10831.1) : the entire branch of humanity that split from haplogroup A some 70,000 years ago, when Homo Sapiens first decided to leave Africa and colonise Eurasia. Some of the earliest descendants of BT were haplogroups C and D, who colonised most of Asia and Oceania, and probably also entered Europe.
  • Haplogroup E (SRY4064, aka SRY8299) : the most successful of all African male lineages, which also spread around the Middle East and Europe during the Neolithic.
  • Haplogroup M2a (SRY9138): found in Papua New Guinea and the Solomon Islands
  • Haplogroup O1b2 (SRY465) : the main branch of O1 found in Austronesian peoples of Taiwan, the Philippines, Indonesia, Melanesia, Micronesia, and Madagascar.
  • Haplogroup O2b (SRY465) : a major male lineage in Japan, Korea and Manchuria. It is associated with the Yayoi colonisation of Japan from Korea from 500 BCE.
  • Haplogroup R1a1 (SRY10831.2!, aka SRY1532.2! => reversion of the BT mutation above) : a lineage which underwent one of the most spectacular expansions in human (pre)history soon after the SRY mutation emerged, conquering half of Europe and a big part of Asia, from Anatolia to Siberia and India. Note that R1a lineages lacking the SRY1532.2! mutation (R1a* defined by the M420 mutation) are almost extinct today.
  • Haplogroup R1b-M167 (SRY2627) : the most common R1b subclade in Catalonia, found in many parts of western Europe.

Other mutations in Y-chromosomal genes

Here are other apparently important mutations in the coding region of the Y chromosome.

  • Haplogroup B2b1 and Haplogroup K both share the same mutation (50f2/C aka DYS7C) in a DYS (a unique Y-DNA segment, although not necessarily in the coding region). Note that K is the ancestral lineage of 80% of Eurasian people.
  • Haplogroup B2b1a1 (MSY2.1) is the largest subclade of haplogroup B. It is found in central Africa.
  • Haplogroup DE (M1, aka YAP) is defined by the Y-chromosome Alu Polymorphism (YAP) insertion, the most well-known unique event polymorphism (UEP), estimated to have occured 65,000 years ago. Haplogroup DE is ancestral to roughly 80% of the paternal lineages in Africa, 10% in western Eurasia and 40% in Japan.
  • Haplogroup I1 is the largest Nordic lineages and is associated with the Germanic migrations. Although originating 27,000 years ago, I1 only started to expand from a single individual 5,000 years ago. As a result of this phylogenetic bottleneck it accumulated over defining 300 mutations. The best known is M253 (rs9341296), which affects the DDX3Y gene, linked to male fertility. I1 is also defined by other gene-altering mutations, such as M307.1 (rs13447354) altering the EIF1AY gene, as well as P30 and P40 in the ARSDP1 pseudogene.
  • Haplogroup I2 is defined among others by M438/P215/S31 (rs17307294), a mutation in the NLGN4Y (Neuroligin 4, Y-Linked), which affects neuroligin, a membrane protein that mediates the formation and maintenance of synapses between neurons.
  • Haplogroup I2a1a-M26 (rs2032629) has a mutation altering the KDM5D gene, encoding an enzyme related to the immune system.
  • Haplogroups J (12f2.1) and D2 (12f2.2) are defined by the same 12f2 STS polymorphism, linked to the deletion of the L1PA4 element in the HERV15yq2 sequence block. The HERV gene stands for Human endogenous retrovirus. Recombinations in HERV15 have been linked to changes in fertility.
  • Haplogroup J1 is defined by 185 SNPs, including M267 (rs9341313) in the EIF1AY gene.
  • Over 90% of people belonging to haplogroup J2 belong to J2a1a-L26, also defined by the L27 mutation (rs34126399), which alters the GAPDHP17 pseudogene.
  • Haplogroup J2b is defined by 189 mutations, including M221 (rs2032667), which is found in the UTY gene (ubiquitously transcribed TPR gene). The UTY gene is expressed in the paraventricular nucleus ofhypothalamus, and mutations could potentially alter the secretion of hormones one way or another, including oxytocin (bonding hormone), vasopressin (social behavior, sexual motivation and pair bonding) and ACTH (response to stress).
  • Haplogroup N1c is a major Siberian lineage and the main paternal lineage of Uralic people. It is defined by 57 mutations, the most famous being Tat/M46/Page70 (rs34442126), in the USP9Y gene. Defects in the USP9Y gene can cause azospermia and infertility. Mutations in that gene that have been positively selected by evolution would, on the contrary, improve male fertility.
  • Haplogroup O1 is defined by MSY2.2, a mutation in the male-specific region of the human Y chromosome (MSY), which seems to recombine frequently with the X chromosome and is associated with spermatogenic functions. One of the most successful lineages in Southeast Asia, which may have originated in southern China where it still makes up 25% of the male lineages.
  • Haplogroup R is the most successful paternal lineage in western Eurasia and accounts for approximately half of European, Central Asian and North Indian paternal lineages. It is defined by 56 mutations, including M207 (aka UTY2) in the UTY gene, and by M306 (rs1558843) altering the EIF1AY gene.
  • Haplogroup R1b1b2, the branch of R1b associated with the Proto-Indo-European expansion, is defined by 105 mutations, including M269 (rs9786153) in the EIF1AY gene. 99% of M269+ men belong to the R1b-L23 branch, which is also defined by L49 (rs9786142) affecting the ZFY gene. Like SRY, ZFY could be a sex-determining gene.
  • R1b-M222 (USP9Y+3636), the presumed lineage of Niall of the Nine Hostages (4th century king of Ireland), is found chiefly in northern Ireland and southern Scotland. Originating less than 2,000 years ago, it now makes up 30% of the Irish population. Defects in the USP9Y gene can cause azospermia and infertility. We can therefore assume, considering the quick expansion of this haplogroup since the early Middle Ages, that the USP9Y+3636 mutation improved fertility in men who have it.
  • Haplogroup T is defined by 240 mutations, including M184 (aka USP9Y+3178) affecting the USP9Y gene.

Obviously, some of the above mutations may be more beneficial to his carrier than others. Whereas some could be mildly favourable, others may provide a significant advantage. The success of haplogroups can be assessed by how frequent they are now, especially in comparison to the past. In that regard, haplogroups J and R were actually the two most successful haplogroups since the Bronze and Iron Ages in western Eurasia. These two haplogroups managed to replace the majority of other haplogroups in the Middle East, one of the region of the world that underwent the highest natural selection due to the early development of agriculture in the Fertile Crescent, as well as of the Copper and Iron Age in Anatolia, and the rise of the oldest civilizations. The region also lies at the crossroads of Europe, Central Asia, South Asia and Africa, and suffered more invasions, wars and competition between human groups than anywhere else on Earth in the last 10,000 years. In this environment haplogroup J (J1 and J2) and R (R1a and R1b) came out as the evolutionary winners, relegating the main Neolithic lineages of the Fertile Crescent (E1b1b, G2, H2, T) to a secondary position. Nor is it always the last invader in date that replaces the most lineages, since the Turkic and Mongol invasions only had a relatively minor impact on Y-DNA haplogroups.

In eastern Eurasia, three lineages make up over 80% of the population: O1, O2a and O2b. While O2a owes its success to the development of agriculture in Neolithic China (Yan et al. 2014), the other two thrived despite competition from other lineages like C1, C2, D1, D2, D3, N, P and Q.

The dominant lineage in Africa is haplogroup E, which makes up over 50% of the continent's male lineages, even exceeding 80% in most of western Africa. Haplogroup T also represents a sizeable percentage of the population in the Horn of Africa, and particularly in northern Somalia and Djibouti, where it can exceed 75% of the lineages. Haplogroup T is not native from Africa and came with Neolitic farmers from the Middle East. Beside haplogoups E and T, the two other African haplogroups are A and B, the two oldest human paternal lineages, which are divided into the dozens of subclades. Yet, unsurprisingly, the most common among them by far is B2b1a1, which benefits from two mutations in the coding region (50f2/C and MSY2.1). B2b1a1 has a remarkably wide distribution, being found in southern, central, eastern and northern Africa, but also in southern Iran, Pakistan and India.

Insertions and deletions within the Y-chromosome

SNPs defining haplogroups are not always mutations (replacing one letter by another), but also insertions (adding one or several letters) and deletions (cutting out one or several letters). Whereas mutations within a gene can be synonymous (the new sequence makes the same amino acid and therefore conserves an identical gene function), insertions and deletions are generally very disruptive and typically cause frameshift mutations, potentially altering whole genes. Little information is available at the moment on the effect this may have on the Y chromosome. Nevertheless, most top level haplogroups seem to be define by insertions and/or deletions. Like coding region mutations, ins/del are extremely rare compared to random SNP's. Less than 100 of them have been identified out of the 15,000+ Y-chromosomal SNPs. It would make sense if these ins/del affected coding genes one way or another, although I have not been able to verify this at present.

Just like for mutations in the coding region of Y-DNA genes, the statistical chances that insertions/deletions happened exactly among lineages that would see a dramatic expansion cannot be a coincidence. In fact, the ins/del mirror the gene mutations mentioned above in haplogroups BT, B2b1a1, DE, E, J, and R.

But insertions and/or deletions can more generally be said to define the majority of top-level haplogroups, including A00, A0, A1, BT, B, C, DE, E, G, I, I1, J, N, O, R, R1a, R1b, T and T1. Note that haplogroup F and P didn't make the list and are almost extinct now. Haplogroup H also isn't listed and H2 became almost exictinct, while H1a1 got a new mutation and prospered in South Asia.

Note that very major haplogroups that have had tremendous historical success at some point in human history, including haplogroups C (9 ins, 4 del), E (8 ins, 7 del), G (5 ins, 4 del), O (9 ins, 3 del) and R (15 del), have the highest number of insertions and/or deletions.

Even relatively deep clades are not ordinary. N1c1-L1026 is the main Finno-Ugric branch, which means the European branch of that otherwise Asian haplogroup. Genetic adaptation to European X chromosomes? In the same line, R1b-M73 is the Asian branch, and R1b-M18 the African branch of an otherwise mainly European haplogroup. Since the X and Y chromosomes do interact with one another, it's not impossible that these are all adaptations to racially different X chromosomes, with a long divergent evolution.

I1 and I2-M284, the two most successful I lineages outside Slavic countries are the only ones with insertions or deletions, and they define exactly I1* and the M284 mutation itself.

R1b-L176 is the most successful subclade of DF27 and achieved remarkably high frequencies in Catalonia in a relatively short time since the Iron Age.

G2a2b2 was the main haplogroup of European Neolithic farmers, and as such was the main Y-haplogroup in Europe for several millennia.

E-M84 is the main Levantine/Jewish subclade of E1b1b, and a particularly successful branch.

O2a1 subclades M117, M121 and M134 are all major East Asian lineages. In the 2002 phylogeny there were simply known respectively as O3e1, O3a and O3d.

Analysis & Discussion

It could be hypothesised that haplogroups in which new beneficial mutations did not arise would become rarer, or even extinct over time. That is what ancient DNA is telling us when looking back at the genetic landscape in western Eurasia since the Palaeolithic period. A great many Palaeolithic and Mesolithic lineages tested to date are in fact extinct. This includes some C1a2, K*, I* and I2* subclades that don't exist anymore, but also R1a and R1b branches (e.g. in Mesolithic Karelia and Samara) that didn't leave any descendants today. While the success of Bronze Age haplogroups can be attributed to superior military technology (bronze weapons, horses, chariots), that does not explain the success of the most haplogroups since the Palaeolithic.

The earliest Out-of-Africa migration (among surviving lineages) was conducted by Y-haplogroups C and D. Yet, for some reason C was considerably more successful, colonising all Europe, the Middle East, North and South Asia, Oceania and later even the Americas, while haplogroup D ended up in secluded places like the Andamans, Tibet and Japan. Why is that? It could be just luck, but that is rarely the case in human evolution. Both haplogroups C and D spread in all directions and developed into numerous subclades. Yet, in all regions they colonised, it was consistently C that left more descendants, even in regions where the two were found among the same ethnic groups. The only exception is haplogroup D1b in Japan. Looking at the list of insertions and deletions above, haplogroup C is defined by 13 of them, while D has none, but D1b has one. We don't know at present if C and D spread together from Africa or if they represent separate migrations. But in any case, they would have met very early on and would have lived together for over 50,000 years until the present. In other words, natural selection was at work for longer between these two lineages that between practically any other two lineages in Homo sapiens prehistory. It could be that D was the more common of the two lineages at first, but that a very slightly improved fertility for members of haplogroup C increased its frequency by a fraction of a percent at each generation. Over 50,000 years, or even over only 10,000 years for that matter, haplogroup C would have almost entirely replaced D in most communities. Haplogroup D could have survived only in specific isolated communities where a founder effect among the first settlers gave them 100% of D and 0% of C to start with. In Japan's case, a new special ins/del mutation for D1b appeared early (about 45,000 years ago) and allowed that lineage to survive.

In Africa, the most ancient lineages to survive today (A00, A0, A0b, A1, B) were all founded by de novo ins/del events. That is probably not a coincidence. The oldest among them, A00, is about 250,000 years old and pre-dates the appearance of Homo sapiens by some 150,000 years! Then, 60,000 years ago a new haplogroup emerged, haplogroup E, which got a series of 7 deletions and 8 insertions in one place on its Y chromosome in a single event. Those deletions altered the SRY (sex-determining region) gene at the tip of the Y chromosome. That apparently had dramatic effect on the fertility of the carrier, as haplogroup E became the overwhelmingly dominant male lineage in Africa, that no amount of genetic diversity and sexual competition from other haplogroups managed to overcome. In fact, E1b1a got four more fertility-boosting deletions and became the main lineage in Sub-Saharan Africa.

The same thing happened repeatedly with all other major haplogroups. Every time, new beneficial insertions and/or deletions were positively selected over the generations and replaced all other side lineages. That is essentially how haplogroups developed. Take 100 Y-DNA lineages descended from a recent ancestor. One of them possesses a new fertility boosting mutation. What is going to happen? The carriers of this Y chromosome will have more children, or at least more sons, as higher sperm count and motility has been linked to an increased propensity to father sons (because male spermatozoa swim faster but die more easily in the uterus, so a high number of fast swimmers equates to a higher chance of having a boy). The X chromosome will develop new adaptations of its own to counterbalance those effects, but that process takes time and only works as long as men mate with women from their own tribe or ethnic group, where those newly evolved X chromosomes are present. When men start conquering vast expanses of land in a short time and mating with indigenous women, as has happened with Bronze Age Indo-Europeans and later Spaniards and Portuguese in the Americas, it is very likely that these conquerors, if they possess Y chromosomes with heightened fertility, will end up having a biased sex-ratio that favours boys slightly more, thus quickly spreading their Y-DNA. This is one of the reasons why new Y-DNA haplogroups typically spread faster outside their original ethnic group than within it, where X chromosomes have had time to adapt progressively to each new Y-boosting mutation.

When I originally wrote this article in 2013, I only considered mutations within Y-chromosomal genes as important for improvements in fertility. I have since considered that insertions and deletions may be just as important. There is maybe another factor I haven't considered yet that also plays a role in fertility, namely the role played by the X chromosome itself. The X chromosome is not only important for women, but also plays a role in male health and fertility. Autosomal genes could also play a role, such as the way a woman's body's immune system reacts to spermatozoa. Some are more aggressive than others, which can lead to female infertility (the immune system destroying all 'foreign cells'). A lower immune response would favour boys as male spermatozoa swim faster but are killed off more easily. The same is true for the cellular acidity, which is in part regulated by mitochondrial DNA. Members of haplogroups U and K have more alkaline cells, which tends to favour boys as male spermatozoa survive better in alkaline environments. A too welcoming uterine environment (low immune reaction + alkaline uterus) would skew the sex ratio toward more boys.

Copyright © 2004-2017 All Rights Reserved.