Many linguists have attempted to reconstruct how the original Proto-Indo-European numerals may have sounded like. Unfortunately they tend to end up with overly theoretical prototypes that are unlikely to have ever been uttered by people. Here are two examples of reconstructed dating from 1995.
[TABLE="class: wikitable"]
[TR]
[TH]Number[/TH]
[TH]Reconstruction (Sihler)[/TH]
[TH]Reconstruction (Beekes)[/TH]
[/TR]
[TR]
[TD]one[/TD]
[TD]*Hoi-no-/*Hoi-wo-/*Hoi-k(ʷ)o-; *sem-[/TD]
[TD]*Hoi(H)nos[/TD]
[/TR]
[TR]
[TD]two[/TD]
[TD]*d(u)wo-[/TD]
[TD]*duoh₁[/TD]
[/TR]
[TR]
[TD]three[/TD]
[TD]*trei- (full grade) / *tri- (zero grade)[/TD]
[TD]*treies[/TD]
[/TR]
[TR]
[TD]four[/TD]
[TD]*kʷetwor- (o-grade) / *kʷetur- (zero grade)[/TD]
[TD]*kʷetuōr[/TD]
[/TR]
[TR]
[TD]five[/TD]
[TD]*penkʷe[/TD]
[TD]*penkʷe[/TD]
[/TR]
[TR]
[TD]six[/TD]
[TD]*s(w)eḱs; originally perhaps *weḱs[/TD]
[TD]*(s)uéks[/TD]
[/TR]
[TR]
[TD]seven[/TD]
[TD]*septm̥[/TD]
[TD]*séptm[/TD]
[/TR]
[TR]
[TD]eight[/TD]
[TD]*oḱtō, *oḱtou or *h₃eḱtō, *h₃eḱtou[/TD]
[TD]*h₃eḱteh₃[/TD]
[/TR]
[TR]
[TD]nine[/TD]
[TD]*(h₁)newn̥[/TD]
[TD]*(h₁)néun[/TD]
[/TR]
[TR]
[TD]ten[/TD]
[TD]*deḱm̥(t)[/TD]
[TD]*déḱmt[/TD]
[/TR]
[/TABLE]
It is painfully obvious how these linguists were influenced by their classical education, giving them a strong bias in favour of Latin and Greek sounding PIE reconstructions. For example the two w sounds in kʷetwor are found only in Latin (Quatuor) and in no other Indo-European languages. It's very odd that both Sihler and Beekes both reconstructed number four this way. I also don't see the need for a w in penkʷe. The final m in séptm is just as fanciful and also found exclusively in Latin. One can also wonder where the h in Hoi-no/Hoi(H)nos and h₃eḱtō/h₃eḱteh₃ come from.
In my opinion, Latin and Greek are pretty poor proxies for reconstructing the original PIE because the ancient Greeks and Romans were thoroughly admixed with indigenous non-Indo-European populations (Pelasgians, Minoans, Etruscans) that severely altered the Indo-European pronunciation and vocabulary.
I believe that population genetics can be very useful in assessing how the influence of non-Indo-European populations in regions that now speak Indo-European languages. Only about 30% of the Greek Y-DNA haplogroups are of Indo-European origin, much less than the average of other Indo-European-speaking regions. The Greek branch of IE languages is probably one of the most hybridised along with the Anatolian and Albanian branches.
Better proxies are Indo-Iranian, Baltic, Slavic and Celtic languages. I would also be careful with Germanic languages as they also underwent heavy hybridisation.
The obvious reason why ancient Greek and Latin are so influential in PIE reconstructions is that there is a wealth of written documents in those ancient languages not found in ancient Balto-Slavic or even Celtic languages. Anatolian languages and Mycenaean Greek are the oldest recorded IE languages, but that doesn't mean they are the purest. Far from it, they are surely those with the strongest non-IE influence as they were languages imposed by a small illiterate elite on advanced ancient civilisations. The more culturally advanced the conquered populations compared to the conquerors, the higher the chance that they will retain all or part of their language and culture. Germanic peoples adopted Latin after invading the Roman Empire, and the Mongols and Manchus adopted Chinese after becoming emperors of China.
It works the other way round too. The Sardinians, who spoke a non-IE language until approximately 2000 years ago, were so close to Rome and its cultural influence that they ended up adopting Latin with only a minimal genetic influx from the Latium to Sardinia.
I also think that the Centum-Satem isolgloss dates back to the very foundation of the Proto-Indo-European culture and society in the Pontic-Caspian Steppes during the Late Neolithic or Chalcolithic. Although R1a and R1b populations did intermix to some extent, there were always two main groups: the southern R1b-dominant group that migrated west (Centum) and the northern R1a-dominant group that migrated east then south (Satem). Although the two groups adopted a common lingua franca, some differences in pronunciations probably existed from the start, so it is futile to try to absolutely unify the two in every case.
Taking all this into account, and giving more weight to "purer" branches of the Indo-European family (Celtic, Balto-Slavic, Indo-Iranian), here is how I believe Bronze Age Proto-Indo-European could have sounded like for cardinal numbers from 1 to 10. My aim here is to reconstruct realistic sounding numbers, not unpronounceable linguist's fantasies.
1. Oins in Centum/R1b ; Aik in Satem/R1a
2. Dwo
3. Trei
4. Catur* in Centum/R1b ; Chatur in Satem/R1a (Latv. četri, OCS četyre, Pol. cztery, Russ. četyre, Persian čahār, Vediccatvāras,)
5. Pempe in Centum/R1b (Gaulish pempe, Welsh pump, Osc. pompe, Umbr. pumpe) ; Penca in Satem/R1a (Lith. penkì, OPruss. pēnkjāi, OCS pętĭ, Vedic pañca, Avestan panca, Persian panča)
6. Seks in Centum/R1b ; Ses in Satem/R1a
7. Septa
8. Octo in Centum/R1b ; Ashta in Satem/R1a
9. Nava (Gaulish navan, Welsh naw, Latin novem, Umbrian nuvim, Vedic nava, Avestan nauua)
10. Deca in Centum/R1b (Gaulish decam, Welsh deg, Umbrian/Latin decem, Greek deka) ; Dasa in Satem/R1a (OPruss. desīmtan, Latv. desmit, Lith. dẽšimt, OCS desętь, Russian desjat', Vedic dáśa, Avestan dasa)
Abbreviations:
- OCS = Old Church Slavonic
- OPruss. = Old Prussian.
* Note that for number 4 the original k sound in Catur became p in most Italo-Celtic languages (except Latin and Irish) and f in Germanic languages.
Analysis
The main characteristic of the Centum-Satem isogloss is that the k sounds in Centum languages become s or sh or ch in Satem languages. As for the original words for one hundred, they were probably closer to cantam (as in Gaulish) and satam (as in Vedic) than centum and satem.
Overall Gaulish and Vedic Sanskrit appear to be the best proxies for Centum and Satem languages respectively. Unsurprisingly their populations also have some of the highest frequencies for haplogroup R1b and R1a respectively.
That may be because Gaulish evolved from a stable geographic area around the Alps (Unetice > Tumulus > Hallstatt > La Tène) before experiencing a quick expansion to France in the Iron Age, only a few centuries before the language was first recorded in writing.
In contrast the Italic branch also originated in the Alps, but Italic people mixed with the indigenous non-IE populations of the Italian peninsula, where the language got corrupted. This is particularly true of Latin, which replaced the Italo-Celtic p sound by a qu (kw) sound (as in quattuor and quīnque instead of the petor and pumpe/pempe in Umbrian and Gaulish).
The Germanic branch also underwent a major phonetic shift, in which the p became f and the k became kh then h, among others. This is first Germanic consonant shift is known as Grimm's Law. The process was further amplified in the early centuries CE by the High German consonant shift. This confirms that it was Proto-Germanic language originated as a deformation of Proto-Celtic through the migration of R1b people from southern Germany to northern Germany and Scandinavia. Unable to pronounce properly Celtic sounds, the natives of Scandinavia and northern Germany reshaped the consonants to match that of their original language (or perhaps also their different mandibular and guttural morphology).
In the Satem branch, Vedic Sanskrit and Avestan appear to be the closest from the original PIE source. The reason is probably that R1a Indo-European speakers did not encounter a lot of resistance from indigenous populations in in Central Asia and managed to impose their language and culture relatively easily, keeping the language purer. Once they reached South Asia, they established the caste system that kept the ruling class clearly distinct from the conquered population, once again preserving the authenticity of their culture and language with minimal indigenous influence.
In contrast, the Proto-Slavs and Proto-Balts ended up mixing heavily with local central and north-east European populations (especially represented by Y-haplogroups I2a1b and N1c1), which led to a similar degree of linguistic hybridization as the Centum language speakers experienced in Nordic countries by admixing with I1 and I2a2 populations.
[TABLE="class: wikitable"]
[TR]
[TH]Number[/TH]
[TH]Reconstruction (Sihler)[/TH]
[TH]Reconstruction (Beekes)[/TH]
[/TR]
[TR]
[TD]one[/TD]
[TD]*Hoi-no-/*Hoi-wo-/*Hoi-k(ʷ)o-; *sem-[/TD]
[TD]*Hoi(H)nos[/TD]
[/TR]
[TR]
[TD]two[/TD]
[TD]*d(u)wo-[/TD]
[TD]*duoh₁[/TD]
[/TR]
[TR]
[TD]three[/TD]
[TD]*trei- (full grade) / *tri- (zero grade)[/TD]
[TD]*treies[/TD]
[/TR]
[TR]
[TD]four[/TD]
[TD]*kʷetwor- (o-grade) / *kʷetur- (zero grade)[/TD]
[TD]*kʷetuōr[/TD]
[/TR]
[TR]
[TD]five[/TD]
[TD]*penkʷe[/TD]
[TD]*penkʷe[/TD]
[/TR]
[TR]
[TD]six[/TD]
[TD]*s(w)eḱs; originally perhaps *weḱs[/TD]
[TD]*(s)uéks[/TD]
[/TR]
[TR]
[TD]seven[/TD]
[TD]*septm̥[/TD]
[TD]*séptm[/TD]
[/TR]
[TR]
[TD]eight[/TD]
[TD]*oḱtō, *oḱtou or *h₃eḱtō, *h₃eḱtou[/TD]
[TD]*h₃eḱteh₃[/TD]
[/TR]
[TR]
[TD]nine[/TD]
[TD]*(h₁)newn̥[/TD]
[TD]*(h₁)néun[/TD]
[/TR]
[TR]
[TD]ten[/TD]
[TD]*deḱm̥(t)[/TD]
[TD]*déḱmt[/TD]
[/TR]
[/TABLE]
It is painfully obvious how these linguists were influenced by their classical education, giving them a strong bias in favour of Latin and Greek sounding PIE reconstructions. For example the two w sounds in kʷetwor are found only in Latin (Quatuor) and in no other Indo-European languages. It's very odd that both Sihler and Beekes both reconstructed number four this way. I also don't see the need for a w in penkʷe. The final m in séptm is just as fanciful and also found exclusively in Latin. One can also wonder where the h in Hoi-no/Hoi(H)nos and h₃eḱtō/h₃eḱteh₃ come from.
In my opinion, Latin and Greek are pretty poor proxies for reconstructing the original PIE because the ancient Greeks and Romans were thoroughly admixed with indigenous non-Indo-European populations (Pelasgians, Minoans, Etruscans) that severely altered the Indo-European pronunciation and vocabulary.
I believe that population genetics can be very useful in assessing how the influence of non-Indo-European populations in regions that now speak Indo-European languages. Only about 30% of the Greek Y-DNA haplogroups are of Indo-European origin, much less than the average of other Indo-European-speaking regions. The Greek branch of IE languages is probably one of the most hybridised along with the Anatolian and Albanian branches.
Better proxies are Indo-Iranian, Baltic, Slavic and Celtic languages. I would also be careful with Germanic languages as they also underwent heavy hybridisation.
The obvious reason why ancient Greek and Latin are so influential in PIE reconstructions is that there is a wealth of written documents in those ancient languages not found in ancient Balto-Slavic or even Celtic languages. Anatolian languages and Mycenaean Greek are the oldest recorded IE languages, but that doesn't mean they are the purest. Far from it, they are surely those with the strongest non-IE influence as they were languages imposed by a small illiterate elite on advanced ancient civilisations. The more culturally advanced the conquered populations compared to the conquerors, the higher the chance that they will retain all or part of their language and culture. Germanic peoples adopted Latin after invading the Roman Empire, and the Mongols and Manchus adopted Chinese after becoming emperors of China.
It works the other way round too. The Sardinians, who spoke a non-IE language until approximately 2000 years ago, were so close to Rome and its cultural influence that they ended up adopting Latin with only a minimal genetic influx from the Latium to Sardinia.
I also think that the Centum-Satem isolgloss dates back to the very foundation of the Proto-Indo-European culture and society in the Pontic-Caspian Steppes during the Late Neolithic or Chalcolithic. Although R1a and R1b populations did intermix to some extent, there were always two main groups: the southern R1b-dominant group that migrated west (Centum) and the northern R1a-dominant group that migrated east then south (Satem). Although the two groups adopted a common lingua franca, some differences in pronunciations probably existed from the start, so it is futile to try to absolutely unify the two in every case.
Taking all this into account, and giving more weight to "purer" branches of the Indo-European family (Celtic, Balto-Slavic, Indo-Iranian), here is how I believe Bronze Age Proto-Indo-European could have sounded like for cardinal numbers from 1 to 10. My aim here is to reconstruct realistic sounding numbers, not unpronounceable linguist's fantasies.
1. Oins in Centum/R1b ; Aik in Satem/R1a
2. Dwo
3. Trei
4. Catur* in Centum/R1b ; Chatur in Satem/R1a (Latv. četri, OCS četyre, Pol. cztery, Russ. četyre, Persian čahār, Vediccatvāras,)
5. Pempe in Centum/R1b (Gaulish pempe, Welsh pump, Osc. pompe, Umbr. pumpe) ; Penca in Satem/R1a (Lith. penkì, OPruss. pēnkjāi, OCS pętĭ, Vedic pañca, Avestan panca, Persian panča)
6. Seks in Centum/R1b ; Ses in Satem/R1a
7. Septa
8. Octo in Centum/R1b ; Ashta in Satem/R1a
9. Nava (Gaulish navan, Welsh naw, Latin novem, Umbrian nuvim, Vedic nava, Avestan nauua)
10. Deca in Centum/R1b (Gaulish decam, Welsh deg, Umbrian/Latin decem, Greek deka) ; Dasa in Satem/R1a (OPruss. desīmtan, Latv. desmit, Lith. dẽšimt, OCS desętь, Russian desjat', Vedic dáśa, Avestan dasa)
Abbreviations:
- OCS = Old Church Slavonic
- OPruss. = Old Prussian.
* Note that for number 4 the original k sound in Catur became p in most Italo-Celtic languages (except Latin and Irish) and f in Germanic languages.
Analysis
The main characteristic of the Centum-Satem isogloss is that the k sounds in Centum languages become s or sh or ch in Satem languages. As for the original words for one hundred, they were probably closer to cantam (as in Gaulish) and satam (as in Vedic) than centum and satem.
Overall Gaulish and Vedic Sanskrit appear to be the best proxies for Centum and Satem languages respectively. Unsurprisingly their populations also have some of the highest frequencies for haplogroup R1b and R1a respectively.
That may be because Gaulish evolved from a stable geographic area around the Alps (Unetice > Tumulus > Hallstatt > La Tène) before experiencing a quick expansion to France in the Iron Age, only a few centuries before the language was first recorded in writing.
In contrast the Italic branch also originated in the Alps, but Italic people mixed with the indigenous non-IE populations of the Italian peninsula, where the language got corrupted. This is particularly true of Latin, which replaced the Italo-Celtic p sound by a qu (kw) sound (as in quattuor and quīnque instead of the petor and pumpe/pempe in Umbrian and Gaulish).
The Germanic branch also underwent a major phonetic shift, in which the p became f and the k became kh then h, among others. This is first Germanic consonant shift is known as Grimm's Law. The process was further amplified in the early centuries CE by the High German consonant shift. This confirms that it was Proto-Germanic language originated as a deformation of Proto-Celtic through the migration of R1b people from southern Germany to northern Germany and Scandinavia. Unable to pronounce properly Celtic sounds, the natives of Scandinavia and northern Germany reshaped the consonants to match that of their original language (or perhaps also their different mandibular and guttural morphology).
In the Satem branch, Vedic Sanskrit and Avestan appear to be the closest from the original PIE source. The reason is probably that R1a Indo-European speakers did not encounter a lot of resistance from indigenous populations in in Central Asia and managed to impose their language and culture relatively easily, keeping the language purer. Once they reached South Asia, they established the caste system that kept the ruling class clearly distinct from the conquered population, once again preserving the authenticity of their culture and language with minimal indigenous influence.
In contrast, the Proto-Slavs and Proto-Balts ended up mixing heavily with local central and north-east European populations (especially represented by Y-haplogroups I2a1b and N1c1), which led to a similar degree of linguistic hybridization as the Centum language speakers experienced in Nordic countries by admixing with I1 and I2a2 populations.
Last edited: