My proposed tree of Indo-European languages

Maciamo · May 11, 2018

Johane Derite posted a list of different phylogenetic trees of IE languages proposed by various linguists in another thread. I thought it would be an ideal opportunity for me to post my proposed phylogenetic tree, which I have not only based on linguistic evidence, but also on archaeological and especially genetic evidence (using Y-chromosomal phylogeny). It differs radically from all the trees proposed by professional linguists, but mine is the only one that makes sense based on Y-DNA phylogeny and the known patterns of migrations combining archaeology and ancient DNA.

I have kept it simple and schematic, but I felt it was necessary to add the associated haplogroups to show that language evolve through population hybridisation, which tends to affect pronunciation and involves the absorption of loan words.

I believe that the Italo-Celtic branch intermingled more extensively with Neolithic European farmers than the Goidelic branch. This is obvious from the relatively high percentages of G2a-L497 and E-V13 among Hallstatt-derived Celts and Italics. I believe that this EEF mixture came originally from the Cucuteni-Trypillian culture, although indirectly. The R1b-L51 branch expanded along the Danube to Central Europe while the R1a/R1b-Z2103 branch of the Corded Ware spread along the North European Plain. The latter would probably have been the ones who mixed with the scattered and by now nomadic tribes who abandoned the Trypillian cities in Western Ukraine. Corded Ware tribes met R1b-L51 tribes in Germany, Czechia and western Poland. But by that time some R1b-L21 and R1b-DF27 adventurers had already permeated the Bell Beaker trade network all the way to the Atlantic coast, before they got the chance to mix with Corded Ware people - hence the absence of E-V13 and G2a-L497 from these Atlantic Celts (Q-Celtic speakers). The Neolithic influence on language eventually led to the Q to P shift in Hallstatt and La Tène Celtic tongues, soon after the split with the Italic tribes.

Proto-Germanic R1b-U106 also mixed with the Corded Ware people and with the earlier inhabitants of the Netherlands, northern Germany and Denmark, who were probably heavier on Mesolithic ancestry and would have carried haplogroups I1 and I2-L801. I believe that a small but noteworthy non-IE pre-Germanic substratum exists in Germanic languages, although many linguists seem to be confused by the fact that some of these loan words eventually found their way in other IE languages because of the Germanic migrations. Germanic loan words infiltrated not only Romance, but also Slavic, Baltic, Albanian and possibly also Greek languages. Germanic languages also seem to have some Balto-Slavic influence, perhaps by the absorption of predominantly R1a Corded Ware tribes.

The complicated part that really get most linguists confused is the Eastern branch. This is because it is in fact two branches: the original East Yamna (R1b-Z2103) and the extension of that Yamna branch into the forest-steppe, which in my opinion is when the satem shift took place. The southern tribes of the Late Yamna and Catacomb (2800–2200 BCE) cultures (both R1b-Z2103) were ousted from the Pontic Steppe by the expansion of the Srubna culture (R1a with some R1b-Z2103) to the north, and the R1b-Z2103 migrated to the Balkans, where they became the Illyrians (incl. Proto-Albanian), Mycenaean Greeks, Phrygians and Proto-Armenians. The latter two eventually migrated from the Balkans to Anatolia around the time of the Bronze Age collapse c. 1200 BCE. Later influence from Iranian tribes in Armenia caused a partial satemisation of Armenian language. The same thing might have happened for Albanian and Greek due to the migrations of other Iranian tribes (Bulgars) and Slavs to the region. This is why Albanian and Armenian in particular cannot be definitely classified as centum or satem.

The Tocharian branch is in all likelihood descended from the Afanasievo culture (3300-2500 BCE), a Steppe culture in the Altai region that is contemporary to Yamna (3500-2500 BCE), but started a few centuries later.

I have wracked my brain about the Anatolian branch, bu IMO the most likely explanation remains that it was an early offshoot from the Pontic Steppe to the Balkans dating from 4200 to 3700 BCE. These people would have stayed a while in the Balkans then, like the Phrygians and Armenians much later, would have moved east to Anatolia. The oldest archaeological site associated with Anatolian IE speakers might be Troy, a city that was founded c. 3000 BCE to control the trade between the Aegean and the Black Sea region, including the Pontic Steppe. It makes sense that Steppe people should have wanted to control trade with their homeland. The language likely to have been prevalent in the historical city of Troy is Luwian, an Anatolian IE language.

Ownstyler · May 11, 2018

Well done! Simple but informative. Why did you put E1b1b instead of E-V13 for Greece? Is all the E-V13 recent introgression?

Maciamo · May 11, 2018

Ownstyler said:
Well done! Simple but informative. Why did you put E1b1b instead of E-V13 for Greece? Is all the E-V13 recent introgression?

Because there is a lot of variety among Greek E1b1b. There is at least E-M34, E-M81, E-V22, in addition to E-V13. I don't know how much of the Greek E-V13 came during the Bronze Age, but it is certain that some of it is of non-Greek origin in historical times (Celtic, Roman, Gothic, Slavic). I am looking forward to get more data on deep V13 clades from Greece to shed some light on its amalgamated origins.

ToBeOrNotToBe · May 11, 2018

Entirely agreed - it all seems fairly conclusive, except with the Anatolians, where it could really go either way (post or pre Steppe, that is).

Expredel · May 11, 2018

Did you consider the Centum / Satem split?

https://en.wikipedia.org/wiki/Centum_and_satem_languages

Greek and Tocharian should be closer to the Celtic branch based on that division. Some linguists claim Armenian and Albanian might be Centum or a third branch.

Languages may also develop more slowly in smaller populations with no national borders, so I think the Slavic branch might be further back.

Maciamo · May 11, 2018

Expredel said:
Did you consider the Centum / Satem split?

https://en.wikipedia.org/wiki/Centum_and_satem_languages

Greek and Tocharian should be closer to the Celtic branch based on that division. Some linguists claim Armenian and Albanian might be Centum or a third branch.

Languages may also develop more slowly in smaller populations with no national borders, so I think the Slavic branch might be further back.

I explained it above. Essentially R1b-dominant groups are centum and R1a-dominant ones are satem. Albanian and Armenian were originally centum but got partially satemised by Iranian and Slavic influences.

bicicleur 2 · May 11, 2018

there have been 2 theories about Celtic origins, but IMO the simplest explantion is that there were 2 major Celtic branches which split quite early (4.5 ka) :
-the Atlantic Celts R1b-L21 which represents the arrival of the Bell Beakers in the British Isles 4.5 ka
-the Central-European Celts, distributed by the Urnfield people, from which Halstatt and La Tene is supposed to descend
there must have been more early Celtic branches, all of which went extinct

Maciamo · May 11, 2018

bicicleur said:
there have been 2 theories about Celtic origins, but IMO the simplest explantion is that there were 2 major Celtic branches which split quite early (4.5 ka) :
-the Atlantic Celts R1b-L21 which represents the arrival of the Bell Beakers in the British Isles 4.5 ka
-the Central-European Celts, distributed by the Urnfield people, from which Halstatt and La Tene is supposed to descend
there must have been more early Celtic branches, all of which went extinct

I completely agree. I didn't list all the extinct branches to keep the tree easy to read. R1b-DF27 is among the branches of Celtic that became extinct. I didn't list Ligurian and Lusitanian either, but they would be side branches of Q-Celtic and Italo-Celtic too.

Illyrian would be under Balkanic, probably ancestral to Albanian. Daco-Thracian is probably also a Balkanic branch.

Greek would have started as Mycenaean Greek next to Illyrian, Phrygian and Armenian under Balkanic. It supposedly evolved to Doric Greek after the mysterious Dorians moved in after 1200 BCE. I believe that the Dorians could have been related to Hallstatt Celts as they came from the North and their name sounds very Celtic (Dorian is a Gaulish given name that still survives in French to this day). If the had come straight from the Steppe like the Mycenaeans, Doric Greek would be a satem language. In terms of DNA, there aren't that many possiblities either based on what haplogroups are present among modern Greeks. Dorians were Indo-European so that makes them either R1b or R1a, but since virtually all R1a in Greece is Gothic or Slavic, that only leaves R1b. If we buy into the premises that R1b-Z2103 is Mycenaeans, the remainder of modern Greek R1b is mostly U152 with a bit of L21 and DF23. Add to that the presence of E-V13 and G2a-L497 branches that could also be Alpine Celtic and we get a lot of reason to think that the Dorians were indeed a branch of Central European Celts. No other known historical migration of Celts to Greece could have had such a lasting genetic impact. For example, Brennus' invasion of 279 BCE was not a mass migration of resettlement, just a military plundering campaign.

A. Papadimitriou · May 11, 2018

Herodotus practically had said that Dorians were originally from South Thessaly, close to modern Greek city which is called Lamia.
The Dorian invasion isn't supported by anything, archaeology or ancient sources.

Herodotus' account points to a movement from South Thessaly to Pindus (Epirus & West Macedonia) to Central Greece and finally to Peloponnese.

Ownstyler · May 11, 2018

Maciamo said:
Greek would have started as Mycenaean Greek next to Illyrian, Phrygian and Armenian under Balkanic. It supposedly evolved to Doric Greek after the mysterious Dorians moved in after 1200 BCE. I believe that the Dorians could have been related to Hallstatt Celts as they came from the North and their name sounds very Celtic (Dorian is a Gaulish given name that still survives in French to this day). If the had come straight from the Steppe like the Mycenaeans, Doric Greek would be a satem language. In terms of DNA, there aren't that many possiblities either based on what haplogroups are present among modern Greeks. Dorians were Indo-European so that makes them either R1b or R1a, but since virtually all R1a in Greece is Gothic or Slavic, that only leaves R1b. If we buy into the premises that R1b-Z2103 is Mycenaeans, the remainder of modern Greek R1b is mostly U152 with a bit of L21 and DF23. Add to that the presence of E-V13 and G2a-L497 branches that could also be Alpine Celtic and we get a lot of reason to think that the Dorians were indeed a branch of Central European Celts. No other known historical migration of Celts to Greece could have had such a lasting genetic impact. For example, Brennus' invasion of 279 BCE was not a mass migration of resettlement, just a military plundering campaign.

You're assuming that the Dorians had a distinct genetic impact. But what if they came from exactly the same place as the Myceneans? They might have migrated together out of the Steppe, but some stopped on the way, only to migrate into Greece after a few hundred years. So it's possible Dorians, if they existed, were the same R1b clade as Myceneans, just like other Balkan BA peoples are.

Sile · May 11, 2018

Proto-Italo-Celto-Illyro-Thraco-Dacian was a single language. After that some phonological change appeared in different dialects of this proto-language. Namely in the dialect from the middle of this group from which evolved the Continental Celtic and the Oscan and Umbrian, the labiovelar (kʷ, gʷ) turned into bi-labials (p, b). The innovations affects all these languages (one should remember that the forefathers of Oscans and Umbrians migrated from the upper Danube valley into the Italian peninsula) (see ultra).

In the eastern vicinity of this group there was the Thraco-Illyrian group which did the same thing, but only to the labiovelars followed by back vowels (*a, *o), while the labiovelars followed by a front vowel (e, i) were palatalized along with regular velar sounds. One may conclude that in Thraco-Illyrian the phenomenon of palatalization before a front vowel took place in about the same time as the one of the bi-labialization of the labiovelars. I should emphasize that bi-labialization of labiovelars did not reach the peripheral dialects such as Insular Celtic, Latino-Faliscan and Epirote dialect (from which Proto-Albanian evolved) (see ultra). I should also mention that the palatalization of velars followed by a front vowel affects all velars (and dentals) and it has nothing to do with the distinction centum/satem.

The Relationship between the Thraco-Illyrian, Italic, and Celtic Language. Indo-Europeanists divide the Celtic and Italic languages into two major groups: the Q-dialects and P-dialects. The Q-Celtic dialects were those which were separated earlier from the main group such as Proto-Irish and Proto-Celtiberian, according to the treatment of Proto-Indo-European labiovelars in these languages. The P-dialects turned the labiovelars into bilabials, while Q-dialects turned the labiovelars into simple velars. Instead, east of the Pyrenees, the Celtic dialects have turned the Proto-Indo-European labiovelars into labials, like in Osco-Umbrian.

Ygorcs · May 11, 2018

Very nice proposal, it's certainly good food for thought on the still uncertain history of how the different IE branches split from each other. As far as I've read about this topic, I think your tree should be much closer to the truth (which we will eventually find out, I hope so) than that made by some linguists, especially those who leaned toward the Anatolian Neolithic hypothesis for the dispersal of IE.

I must say, however, that to say that languages "evolve through population hybridisation" is only a half truth, because even in the absence of such genetic changes in the demographic makeup of the population there can be profound and unusually rapid linguistic evolution triggered by a random combination of societal and cultural developments (e.g. changes from Middle English to Modern English mainly between 1300 and 1650; profound phonetic changes in European Portuguese from 1600 to 1800).

Also, the mere isolation (not just geographic, but in some cases political and/or cultural) of two different strands of the very same people can lead to very divergent outcomes in the long term. So, I'd be a little wary of assigning all the development of independent IE branches to instances of population hybridization, though this interaction with absorbed non-IE peoples must have been one of the decisive factors in that history.

I agree with virtually all the proposed positions for the IE subfamilies in your tree, but I have two minor quibbles, which are the following.

First of all, it's important to say that these phylogenetic trees unfortunately have some kind of inherent flaw unless you establish some shaded connections between some groups that had originally started to diverge significantly, but apparently remained in closer contact with each other and ended up evolving/diverging more slowly than would've been the case, as well as sharing more areal features and regional dispersal of loanwords and new phonetic/syntactic trends (Sprachbund). There is also the very possible case that the language vis a vis Y-DNA makeup correlation became much more blurred and flawed with time because of the huge expansions of just a few languages (even in historic times, e.g. Latin, Gaulish/La Tène Celtic, but that also certainly happened in illiterate/unattested periods).

In my opinion, for instance, Goidelic (Q-Celtic) was probably not the original language of the R1b-L21 people in insular Northwestern Europe, because from a glottochronological perspective it simply does not look like it diverged soon enough from continental Celtic, including P-Celtic branches. Earlier, yes, certainly, but not before the final split of Celtic and Italic languages, related though as these latter may be. Most of the estimates I've seen place Q-Celtic Goidelic diverging around 800-1100 BC.

In any case, Goidelic is most definitely a sister to P-Celtic through an intermediate Proto-Celtic stage, and despite some superficial similarities (especially in phonology and basic grammar) P-Celtic (e.g. Gaulish) is not any closer to Italic than to Q-Celtic. There is also very substantial differentiation in lexicon between Celtic (Q or P Celtic) and Italic, which is suggestive of some significantly different (in geography and/or chronology) influences.

So, in my opinion the most probable scenario is that a para-Celtic tongue (more Celtic than Italic, but with affinities to both, maybe more or less like Lusitanian) had been the native language of the future Goidelic population, but they shifted to a similar but still distinct language during when huge "Celtic proper" (Late Urnfield/Early Hallstatt) expansion began - and of course they must've shifted to it in their own way, lending the new language much of the syntactic and some of the lexical peculiarities of their former language, this process being even made easier because the two languages were close enough to allow some hybridization without losing intelligibility (think of Portuñol Riverense in the border between Brazil and Uruguay).

So, to sum it up, for me the situation was a bit more complex: I'd place Goidelic as a very early offshoot of Proto-Celtic, probably due to language shift and linguistic convergence with an older sister language, and a bit further apart from them. Those early splits (Italo-Celtic into Celtic vs. Italic and then Goidelic vcs. P-Celtic vs. Italic) must've been really fast in historical terms, the probable sign of a rapid and huge cultural expansion (Urnfield?).

Beside that, I'd place Albanian (and probably its Illyrian mother or maybe grandmother tongue) together with Graeco-Armenian but in a position closer - and in some ways related to (maybe a dotted line could do that trick) - to the Northeastern branch, particularly Balto-Slavic. At least according to a quite comprehensive book about the history of Albanian vocabulary and grammar that I've read, Albanian shares the largest number of isoglosses first with Greek and second to Balto-Slavic, especially in fact Baltic languages.

So, in my opinion Albanian (and we know Illyrian came originally from further north in the Balkans, possibly near the ancient homeland of Balto-Slavic) should've been a sort of middle ground between the Southeastern/Balkanic and the Northeastern IE branches, subject to regional trends and share of loanwords (or maybe of similar non-IE substrate) not just with other Balkanic IE dialects, but also with Northeastern IE ones. The tree format doesn't lend itself very well to demonstrate these instances of languages that diverged significantly but due to geographic closeness, despite their distinct origins from different nodes, remain more closely related to each other than you would otherwise expect.

Tutkun Arnaut · May 11, 2018

Ygorcs said:
Very nice proposal, it's certainly good food for thought on the still uncertain history of how the different IE branches split from each other. As far as I've read about this topic, I think your tree should be much closer to the truth (which we will eventually find out, I hope so) than that made by some linguists, especially those who leaned toward the Anatolian Neolithic hypothesis for the dispersal of IE.

I must say, however, that to say that languages "evolve through population hybridisation" is only a half truth, because even in the absence of such genetic changes in the demographic makeup of the population there can be profound and unusually rapid linguistic evolution triggered by a random combination of societal and cultural developments (e.g. changes from Middle English to Modern English mainly between 1300 and 1650; profound phonetic changes in European Portuguese from 1600 to 1800).

Also, the mere isolation (not just geographic, but in some cases political and/or cultural) of two different strands of the very same people can lead to very divergent outcomes in the long term. So, I'd be a little wary of assigning all the development of independent IE branches to instances of population hybridization, though this interaction with absorbed non-IE peoples must have been one of the decisive factors in that history.

I agree with virtually all the proposed positions for the IE subfamilies in your tree, but I have two minor quibbles, which are the following.

First of all, it's important to say that these phylogenetic trees unfortunately have some kind of inherent flaw unless you establish some shaded connections between some groups that had originally started to diverge significantly, but apparently remained in closer contact with each other and ended up evolving/diverging more slowly than would've been the case, as well as sharing more areal features and regional dispersal of loanwords and new phonetic/syntactic trends (Sprachbund). There is also the very possible case that the language vis a vis Y-DNA makeup correlation became much more blurred and flawed with time because of the huge expansions of just a few languages (even in historic times, e.g. Latin, Gaulish/La Tène Celtic, but that also certainly happened in illiterate/unattested periods).

In my opinion, for instance, Goidelic (Q-Celtic) was probably not the original language of the R1b-L21 people in insular Northwestern Europe, because from a glottochronological perspective it simply does not look like it diverged soon enough from continental Celtic, including P-Celtic branches. Earlier, yes, certainly, but not before the final split of Celtic and Italic languages, related though as these latter may be. Most of the estimates I've seen place Q-Celtic Goidelic diverging around 800-1100 BC.

In any case, Goidelic is most definitely a sister to P-Celtic through an intermediate Proto-Celtic stage, and despite some superficial similarities (especially in phonology and basic grammar) P-Celtic (e.g. Gaulish) is not any closer to Italic than to Q-Celtic. There is also very substantial differentiation in lexicon between Celtic (Q or P Celtic) and Italic, which is suggestive of some significantly different (in geography and/or chronology) influences.

So, in my opinion the most probable scenario is that a para-Celtic tongue (more Celtic than Italic, but with affinities to both, maybe more or less like Lusitanian) had been the native language of the future Goidelic population, but they shifted to a similar but still distinct language during when huge "Celtic proper" (Late Urnfield/Early Hallstatt) expansion began - and of course they must've shifted to it in their own way, lending the new language much of the syntactic and some of the lexical peculiarities of their former language, this process being even made easier because the two languages were close enough to allow some hybridization without losing intelligibility (think of Portuñol Riverense in the border between Brazil and Uruguay).

So, to sum it up, for me the situation was a bit more complex: I'd place Goidelic as a very early offshoot of Proto-Celtic, probably due to language shift and linguistic convergence with an older sister language, and a bit further apart from them. Those early splits (Italo-Celtic into Celtic vs. Italic and then Goidelic vcs. P-Celtic vs. Italic) must've been really fast in historical terms, the probable sign of a rapid and huge cultural expansion (Urnfield?).

Beside that, I'd place Albanian (and probably its Illyrian mother or maybe grandmother tongue) together with Graeco-Armenian but in a position closer - and in some ways related to (maybe a dotted line could do that trick) - to the Northeastern branch, particularly Balto-Slavic. At least according to a quite comprehensive book about the history of Albanian vocabulary and grammar that I've read, Albanian shares the largest number of isoglosses first with Greek and second to Balto-Slavic, especially in fact Baltic languages.

So, in my opinion Albanian (and we know Illyrian came originally from further north in the Balkans, possibly near the ancient homeland of Balto-Slavic) should've been a sort of middle ground between the Southeastern/Balkanic and the Northeastern IE branches, subject to regional trends and share of loanwords (or maybe of similar non-IE substrate) not just with other Balkanic IE dialects, but also with Northeastern IE ones. The tree format doesn't lend itself very well to demonstrate these instances of languages that diverged significantly but due to geographic closeness, despite their distinct origins from different nodes, remain more closely related to each other than you would otherwise expect.

Albanian shares grammatical similarity with Germanic languages. Also similarities with Baltoslavic. Which means early Albanian speakers have been close to R1 majority speakers. does not make a lot of sense grouping it with Armenian or Greek'

LABERIA · May 11, 2018

Tutkun Arnaut said:
Albanian shares grammatical similarity with Germanic languages. Also similarities with Baltoslavic. Which means early Albanian speakers have been close to R1 majority speakers. does not make a lot of sense grouping it with Armenian or Greek'

Can you change your nickname, please? How many times we have to ask you to change it?
Is it possible that someone from the mods resolve this situation?

Ygorcs · May 11, 2018

Tutkun Arnaut said:
Albanian shares grammatical similarity with Germanic languages. Also similarities with Baltoslavic. Which means early Albanian speakers have been close to R1 majority speakers. does not make a lot of sense grouping it with Armenian or Greek'

AFAIK Albanian is very unique because it shares most isoglosses with Greek, but Balto-Slavic is in a very close 2nd position, and then there are also appreciable Germanic isoglosses, as well as some grammatical similarity. Ancient Venetic/Liburnian languages also seem to have had this odd resemblance with some Germanic words, and they were in or adjacent to the northwestern part of the Balkans. I wonder if there was a Central European branch or group of branches, including Venetic/Liburnian and also Illyrian, which were basically an intermediate stage between Germanic in the north, Balto-Slavic in the east, Graeco-Armenian in the south and Celto-Italic in the west.

Ygorcs · May 11, 2018

LABERIA said:
Can you change your nickname, please? How many times we have to ask you to change it?
Is it possible that someone from the mods resolve this situation?

What's the problem there, could you explain? Have you considered if, if it sounds offensive, it is just a kind of tongue in cheek irony or sarcasm or something like that? Anyway, in order to avoid going off topic, please could you consider making a specific topic about this issue in the Site Feedback/Admin Contact section? That'd avoid some confusion in this topic. Thanks.

Maciamo · May 11, 2018

Ygorcs said:
I must say, however, that to say that languages "evolve through population hybridisation" is only a half truth, because even in the absence of such genetic changes in the demographic makeup of the population there can be profound and unusually rapid linguistic evolution triggered by a random combination of societal and cultural developments (e.g. changes from Middle English to Modern English mainly between 1300 and 1650; profound phonetic changes in European Portuguese from 1600 to 1800).

I agree that languages do not evolve only by population hybridisation. But whenever two populations speaking different languages blend together, or one dominant minority imposes its language on a population speaking another tongue, languages tend to evolve more quickly.

To use your example of Middle English to Modern English, it was in fact the result of a blend of two populations, the French-speaking Norman elite and the Middle English-speaking populace. The two started to converge in the 14th century when French was replaced by Middle English as the official language of the English court and government. The merger of the two language resulted in modern English.

The same thing happened repeatedly in the evolution of Japanese language. During the Mesolithic/Neolithic Jomon period, genetic and linguistic evidence indicates that Austronesian people (Y-DNA O1a) from Taiwan reached Japan and mixed with native Jomon (Y-DNA C1a1 and D2b). From 500 BCE, the Yayoi from Korea invaded Japan with their Iron Age technological package and blended with the Jomon. The blend of the two populations resulted in a hybrid language, just like modern English. From the 6th century, the Japanese adopted Chinese characters and Buddhism and started importing thousands of Chinese words. Nowadays Japanese often has two words for a same meaning, one from Chinese and an originally Japanese (Yayoi-Jomon-Austronesian trihybrid) one. After WWII, the Americans occupied Japan for 6 years and Japanese society became very Americanised. The Japanese have since been continuously importing new English words every year. Many have come to replace preexisting Japanese words. For example, the Japanese use the English words table, fork, knife, door, glass, cup, bowl, pan, light, service, toilet, etc. on a daily basis even though there are Japanese words for them (many which the younger generations do not know anymore as only the English words are used nowadays). The younger the Japanese the more likely they are to use English words instead of native Japanese words, so much so that they often have difficult to communicate with their grandparents or great-grandparents. What is funny is that those young Japanese who adopt English words may not be fluent in English at all, and indeed may be incapable of holding a conversation in English, so it's not because of bilingualism that English pervades their language. It's just the overwhelming cultural influence of English-speaking countries (not just the USA). So, while the first (Austronesian) and second (Yayoi) major evolutions of Japanese language occurred through population hybridisations, the third (Chinese) and fourth (English) ones happened purely through cultural diffusion.

I agree with virtually all the proposed positions for the IE subfamilies in your tree, but I have two minor quibbles, which are the following.

First of all, it's important to say that these phylogenetic trees unfortunately have some kind of inherent flaw unless you establish some shaded connections between some groups that had originally started to diverge significantly, but apparently remained in closer contact with each other and ended up evolving/diverging more slowly than would've been the case, as well as sharing more areal features and regional dispersal of loanwords and new phonetic/syntactic trends (Sprachbund). There is also the very possible case that the language vis a vis Y-DNA makeup correlation became much more blurred and flawed with time because of the huge expansions of just a few languages (even in historic times, e.g. Latin, Gaulish/La Tène Celtic, but that also certainly happened in illiterate/unattested periods).

That's true, but the above tree stopped around the Late Bronze to Early Iron Age. I did not expand the various Italic then Romance branches, as it would have required to show the influence of other languages on the development of Romance languages. For example, French was a form of Vulgar Latin slightly influenced by Gaulish and considerably influenced by Frankish (pronunciation + hundreds of loan words). Romanian have Slavic influence. South Italian dialect have Greek influence. Sardinian still has a Nuragic substratum.

In my opinion, for instance, Goidelic (Q-Celtic) was probably not the original language of the R1b-L21 people in insular Northwestern Europe, because from a glottochronological perspective it simply does not look like it diverged soon enough from continental Celtic, including P-Celtic branches. Earlier, yes, certainly, but not before the final split of Celtic and Italic languages, related though as these latter may be. Most of the estimates I've seen place Q-Celtic Goidelic diverging around 800-1100 BC.

Maybe, but it's hard to see how that Celtic language would have come from the continent during the 800-1100 BCE time frame. The British Beaker DNA showed that Irish population remained almost unchanged from 2200 BCE until the Anglo-Norman and Viking colonisations. Hallstatt people only reached southern England around 500 BCE and never colonised the rest of Britain nor Ireland. La Tène Celts did reach Ireland and Scotland after 200 BCE, but they spoke P-Celtic. Perhaps Goidelic evolved less quickly justly because Ireland and Scotland had low populations and were very isolated from foreign influences for so long. I think that languages, just like DNA, evolve faster in large populations or in cosmopolitan populations than in small isolated populations. That much can be gathered from Icelandic, which remained more archaic than other Scandinavian languages due to its small population, even though there was hardly any historical migration toward continental Scandinavia either.

In any case, Goidelic is most definitely a sister to P-Celtic through an intermediate Proto-Celtic stage, and despite some superficial similarities (especially in phonology and basic grammar) P-Celtic (e.g. Gaulish) is not any closer to Italic than to Q-Celtic. There is also very substantial differentiation in lexicon between Celtic (Q or P Celtic) and Italic, which is suggestive of some significantly different (in geography and/or chronology) influences.

The above tree shows the language at the time they diverged. You are talking about written evidence of Italic, P-Celtic and Q-Celtic languages that came many centuries (or even millennia for Q-Celtic) after the split occurred. Classical Latin, for example, was probably quite different from Proto-Italic because the Romans had mixed with and absorbed so many neighbouring populations, including non-IE ones like Etruscan (and undoubtedly others from the native populations conquered by Italic tribes, notably in the Apennines and Latium). So Latin was surely already a hybridised language by the time it became written. Ditto for Gaulish. Modern Romance languages are all partial hybrids with various Celtic, Gascon, Nuragic, Germanic, Greek and Slavic influences. There is no reason that Italic language in 200 BCE had not similarly evolved over the last 1000 years since Italic tribes arrived in Italy, especially since they did intermingle extensively with indigenous people and other late comers (Etruscans, Greeks).

We can't reasonably estimate the degree of similarity between Proto-P-Celtic and Proto-Italic 3000 years ago based on the Classical Latin of 1000 years later and very partial elements of Gaulish language equally remote from the Urnfield/Early Hallstatt source. But Y-DNA can tell us how closely related these people were, and I think that should trump inexistant or heavily derived linguistic evidence.

Beside that, I'd place Albanian (and probably its Illyrian mother or maybe grandmother tongue) together with Graeco-Armenian but in a position closer - and in some ways related to (maybe a dotted line could do that trick) - to the Northeastern branch, particularly Balto-Slavic. At least according to a quite comprehensive book about the history of Albanian vocabulary and grammar that I've read, Albanian shares the largest number of isoglosses first with Greek and second to Balto-Slavic, especially in fact Baltic languages.

So, in my opinion Albanian (and we know Illyrian came originally from further north in the Balkans, possibly near the ancient homeland of Balto-Slavic) should've been a sort of middle ground between the Southeastern/Balkanic and the Northeastern IE branches, subject to regional trends and share of loanwords (or maybe of similar non-IE substrate) not just with other Balkanic IE dialects, but also with Northeastern IE ones. The tree format doesn't lend itself very well to demonstrate these instances of languages that diverged significantly but due to geographic closeness, despite their distinct origins from different nodes, remain more closely related to each other than you would otherwise expect.

The situation with Albanian is the same. The Balto-Slavic influence most certainly dates from the Gothic and Slavic migrations. After all, modern Albanians have about 21% of Slavic Y-DNA, 5% of Germanic Y-DNA and that is without counting the 27% of E-V13, which is at least partially Gothic or Slavic. That doesn't mean that Albanian language itself descends from the same R1a-Z280 branch as Balto-Slavic. It would be like saying that French evolved from a branch in between Italo-Celtic and Germanic languages because it has influences from both! No, obviously French is Italic, and the Germanic influence is merely loanwords from the Franks who came much later.

When looking at modern languages like Albanian without knowing much about their ancient ancestors (like Illyrian), all you see is the cooked dish, but it's not always easy to guess the ingredients, and even less in which order those ingredients were added.

The advantage with Y-DNA (although not autosomal DNA) is that you can follow the evolution step by step, generation after generation, for all the world population. With ancient DNA and archaeology we can even put dates on population movement matching that grand human genealogy. And since language follow people when they migrate, we can usually assume that the genetic phylogeny matches the linguistic phylogeny - as long as we account for population hybridisation and how they affect a society's dominant language.

The more dominant a group is culturally or politically, the more of its language will survive the hybridisation process. We have seen that culture trumps politics, as in the case of Germanic tribes adopting Latin despite having become the ruling class. Large population replacement are almost always accompanied by an unambiguous language replacement too (e.g. Anglo-Saxons in England). But sometimes a small elite can impose its language on a much larger population if it is culturally dominant, and even more so if it is accompanied by religion replacement (e.g. Sanskrit, Arabic, Spanish in Mexico and the Andes).

Maciamo · May 11, 2018

Ownstyler said:
You're assuming that the Dorians had a distinct genetic impact. But what if they came from exactly the same place as the Myceneans? They might have migrated together out of the Steppe, but some stopped on the way, only to migrate into Greece after a few hundred years. So it's possible Dorians, if they existed, were the same R1b clade as Myceneans, just like other Balkan BA peoples are.

That's possible, but how do you account for the Celtic Y-DNA in Greece then? Or is it all Roman? (possible as the two are closely linked)

Ygorcs · May 12, 2018

Maciamo said:
To use your example of Middle English to Modern English, it was in fact the result of a blend of two populations, the French-speaking Norman elite and the Middle English-speaking populace. The two started to converge in the 14th century when French was replaced by Middle English as the official language of the English court and government. The merger of the two language resulted in modern English.

I agree with your main point, but disagree on the particular example you used. The bulk of the Norman-influenced (but not necessarily triggered) changes happened during the transition from Old English to Middle English, ranging from ~1000 to ~1350 AD. By 1400 the really significant Anglo-Norman influence had virtually ceased, and it seems that Norman French ceased to be the dominant language of the courts and elite by the late 1300s and by the late 1400s wasn't even spoken in significant numbers any longer, so I think it's unlikely that the main changes that happened from Middle English to Modern English involved significant Anglo-Norman influence, especially because the most transformative changes in phonology and grammar seem to have consolidated during the 16th century, well after French ceased to be a formal court/state language. On the other hand, you're absolutely right about the very fascinating development of the Japanese language.

The above tree shows the language at the time they diverged. You are talking about written evidence of Italic, P-Celtic and Q-Celtic languages that came many centuries (or even millennia for Q-Celtic) after the split occurred. Classical Latin, for example, was probably quite different from Proto-Italic because the Romans had mixed with and absorbed so many neighbouring populations, including non-IE ones like Etruscan (and undoubtedly others from the native populations conquered by Italic tribes, notably in the Apennines and Latium). So Latin was surely already a hybridised language by the time it became written. Ditto for Gaulish. Modern Romance languages are all partial hybrids with various Celtic, Gascon, Nuragic, Germanic, Greek and Slavic influences. There is no reason that Italic language in 200 BCE had not similarly evolved over the last 1000 years since Italic tribes arrived in Italy, especially since they did intermingle extensively with indigenous people and other late comers (Etruscans, Greeks).

I don't think that's the most likely explanation for the perceived larger distance between attested P-Celtic and Italic in relation to P-Celtic vis a vis Q-Celtic, especially because most of the lexical differences are not particularly related with different non-Italo-Celtic substrates (which would be easily explained by latter and distinct processes of language contact in each of those languages), but instead they rely more on the particular "choices" of different IE stems and roots to form derived terms in P-Celtic versus Italic. Celtic uses a lot of IE roots that aren't found for the same meaning in Italic, and vice-versa.

Then we're probably talking about a much more ancient dialectal differentiation, because they have to do with the construction of the basic vocabulary through internal means of noun/verb derivation, using the core roots they inherited from an earlier Western IE dialect.

So, my best guess to explain that, considering that this differentiation didn't only involved differet external influences, but also different internal developments, would be that Proto-Celtic and Proto-Italic diverged earlier, then a bit later Goidelic Q-Celtic and possibly other extinct Celtic branches split apart, and the extant subfamily which would become P-Celtic remained in close contact with Proto-Italic and formed a kind of Sprachbund with areal features that prevented their further and more rapid divergence from each other, while Goidelic, very far away, didn't experience those constraints.

As for how Goidelic could've come to be without any massive migration into Ireland, I'm not completely sure, considering the very new and still unpublished Reich study about Iron Age Britain, that we will really find no fine-scale change at all between Bell Beaker and Anglo-Norman/Viking Ireland, though it may definitely have involved only an influx of a very similar (genetically) people.

But even if that did not happen, as I said in my 1st answer, I think a language shift with a significant degree of hybridization (kind of a Celtic "Portuñol") is much more feasible when you have a local people who speak a language still very closely related to other language that has more cultural and economic prestige and is spoken in the neighboring region, making that shift very easy and actually almost imperceptible because the foreign influences get into the local language slyly almost as if nobody could notice, e.g. see how Galician has become clearly less and less Portuguese-like and more and more Castillianized in the last 500 years, gradually over the generations (that neighbnoring powerful language could maybe have been an early Urnfield/Hallstatt Celtic language in Britain, maybe also initially spoken in the Atlantic shore of the continent).

Another possibility, maybe even more plausible in fact, is that Goidelic is simply the Bell Beaker Para-Celtic that became gradually and heavily "Celticized" through several generations due to long-term influence from a language so similar that a re-convergence (maybe combined with widespread bilingualism) became almost inevitable, like that "Castillianized Galician" you may often hear in bilingual urban and cosmopolitan areas of modern Galicia.

But of course here I am speculating to try to fit this supposed early divergence of Goidelic with the linguistic comparisons and glottochronological evidences that do not support a very ancient split between P-Celtic and Q-Celtic.

Expredel · May 12, 2018

Maciamo said:
I explained it above. Essentially R1b-dominant groups are centum and R1a-dominant ones are satem. Albanian and Armenian were originally centum but got partially satemised by Iranian and Slavic influences.

It doesn't quite explain why Tocharian isn't within a centum branch, unless you assume IE started out as centum, with satem emerging later.

Another improvement you should consider is adding a time scale, and matching it precisely with Y-DNA dates. This might bring some inconsistencies to light.

My proposed tree of Indo-European languages

Veteran member

Regular Member

Veteran member

Regular Member

Regular Member

Veteran member

Regular Member

Veteran member

Regular Member

Regular Member

Banned

Active member

Banned

Banned

Active member

Active member

Veteran member

Veteran member

Active member

Regular Member