Revising the reconstruction of Proto-Indo-European numerals

Maciamo · Jul 31, 2014

Many linguists have attempted to reconstruct how the original Proto-Indo-European numerals may have sounded like. Unfortunately they tend to end up with overly theoretical prototypes that are unlikely to have ever been uttered by people. Here are two examples of reconstructed dating from 1995.

Number	Reconstruction (Sihler)	Reconstruction (Beekes)
one	Hoi-no-/Hoi-wo-/Hoi-k(ʷ)o-; sem-	*Hoi(H)nos
two	*d(u)wo-	*duoh₁
three	trei- (full grade)* / tri- (zero grade)*	*treies
four	kʷetwor- (o-grade)* / kʷetur- (zero grade)*	*kʷetuōr
five	*penkʷe	*penkʷe
six	s(w)eḱs; originally perhaps* *weḱs	*(s)uéks
seven	*septm̥	*séptm
eight	oḱtō, oḱtou or h₃eḱtō, h₃eḱtou	*h₃eḱteh₃
nine	*(h₁)newn̥	*(h₁)néun
ten	*deḱm̥(t)	*déḱmt

It is painfully obvious how these linguists were influenced by their classical education, giving them a strong bias in favour of Latin and Greek sounding PIE reconstructions. For example the two w sounds in kʷetwor are found only in Latin (Quatuor) and in no other Indo-European languages. It's very odd that both Sihler and Beekes both reconstructed number four this way. I also don't see the need for a w in penkʷe. The final m in séptm is just as fanciful and also found exclusively in Latin. One can also wonder where the h in Hoi-no/Hoi(H)nos and h₃eḱtō/h₃eḱteh₃ come from.

In my opinion, Latin and Greek are pretty poor proxies for reconstructing the original PIE because the ancient Greeks and Romans were thoroughly admixed with indigenous non-Indo-European populations (Pelasgians, Minoans, Etruscans) that severely altered the Indo-European pronunciation and vocabulary.

I believe that population genetics can be very useful in assessing how the influence of non-Indo-European populations in regions that now speak Indo-European languages. Only about 30% of the Greek Y-DNA haplogroups are of Indo-European origin, much less than the average of other Indo-European-speaking regions. The Greek branch of IE languages is probably one of the most hybridised along with the Anatolian and Albanian branches.

Better proxies are Indo-Iranian, Baltic, Slavic and Celtic languages. I would also be careful with Germanic languages as they also underwent heavy hybridisation.

The obvious reason why ancient Greek and Latin are so influential in PIE reconstructions is that there is a wealth of written documents in those ancient languages not found in ancient Balto-Slavic or even Celtic languages. Anatolian languages and Mycenaean Greek are the oldest recorded IE languages, but that doesn't mean they are the purest. Far from it, they are surely those with the strongest non-IE influence as they were languages imposed by a small illiterate elite on advanced ancient civilisations. The more culturally advanced the conquered populations compared to the conquerors, the higher the chance that they will retain all or part of their language and culture. Germanic peoples adopted Latin after invading the Roman Empire, and the Mongols and Manchus adopted Chinese after becoming emperors of China.

It works the other way round too. The Sardinians, who spoke a non-IE language until approximately 2000 years ago, were so close to Rome and its cultural influence that they ended up adopting Latin with only a minimal genetic influx from the Latium to Sardinia.

I also think that the Centum-Satem isolgloss dates back to the very foundation of the Proto-Indo-European culture and society in the Pontic-Caspian Steppes during the Late Neolithic or Chalcolithic. Although R1a and R1b populations did intermix to some extent, there were always two main groups: the southern R1b-dominant group that migrated west (Centum) and the northern R1a-dominant group that migrated east then south (Satem). Although the two groups adopted a common lingua franca, some differences in pronunciations probably existed from the start, so it is futile to try to absolutely unify the two in every case.

Taking all this into account, and giving more weight to "purer" branches of the Indo-European family (Celtic, Balto-Slavic, Indo-Iranian), here is how I believe Bronze Age Proto-Indo-European could have sounded like for cardinal numbers from 1 to 10. My aim here is to reconstruct realistic sounding numbers, not unpronounceable linguist's fantasies.

1. Oins in Centum/R1b ; Aik in Satem/R1a
2. Dwo
3. Trei
4. Catur* in Centum/R1b ; Chatur in Satem/R1a (Latv. četri, OCS četyre, Pol. cztery, Russ. četyre, Persian čahār, Vediccatvāras,)
5. Pempe in Centum/R1b (Gaulish pempe, Welsh pump, Osc. pompe, Umbr. pumpe) ; Penca in Satem/R1a (Lith. penkì, OPruss. pēnkjāi, OCS pętĭ, Vedic pañca, Avestan panca, Persian panča)
6. Seks in Centum/R1b ; Ses in Satem/R1a
7. Septa
8. Octo in Centum/R1b ; Ashta in Satem/R1a
9. Nava (Gaulish navan, Welsh naw, Latin novem, Umbrian nuvim, Vedic nava, Avestan nauua)
10. Deca in Centum/R1b (Gaulish decam, Welsh deg, Umbrian/Latin decem, Greek deka) ; Dasa in Satem/R1a (OPruss. desīmtan, Latv. desmit, Lith. dẽšimt, OCS desętь, Russian desjat', Vedic dáśa, Avestan dasa)

Abbreviations:
- OCS = Old Church Slavonic
- OPruss. = Old Prussian.

* Note that for number 4 the original k sound in Catur became p in most Italo-Celtic languages (except Latin and Irish) and f in Germanic languages.

Analysis

The main characteristic of the Centum-Satem isogloss is that the k sounds in Centum languages become s or sh or ch in Satem languages. As for the original words for one hundred, they were probably closer to cantam (as in Gaulish) and satam (as in Vedic) than centum and satem.

Overall Gaulish and Vedic Sanskrit appear to be the best proxies for Centum and Satem languages respectively. Unsurprisingly their populations also have some of the highest frequencies for haplogroup R1b and R1a respectively.

That may be because Gaulish evolved from a stable geographic area around the Alps (Unetice > Tumulus > Hallstatt > La Tène) before experiencing a quick expansion to France in the Iron Age, only a few centuries before the language was first recorded in writing.

In contrast the Italic branch also originated in the Alps, but Italic people mixed with the indigenous non-IE populations of the Italian peninsula, where the language got corrupted. This is particularly true of Latin, which replaced the Italo-Celtic p sound by a qu (kw) sound (as in quattuor and quīnque instead of the petor and pumpe/pempe in Umbrian and Gaulish).

The Germanic branch also underwent a major phonetic shift, in which the p became f and the k became kh then h, among others. This is first Germanic consonant shift is known as Grimm's Law. The process was further amplified in the early centuries CE by the High German consonant shift. This confirms that it was Proto-Germanic language originated as a deformation of Proto-Celtic through the migration of R1b people from southern Germany to northern Germany and Scandinavia. Unable to pronounce properly Celtic sounds, the natives of Scandinavia and northern Germany reshaped the consonants to match that of their original language (or perhaps also their different mandibular and guttural morphology).

In the Satem branch, Vedic Sanskrit and Avestan appear to be the closest from the original PIE source. The reason is probably that R1a Indo-European speakers did not encounter a lot of resistance from indigenous populations in in Central Asia and managed to impose their language and culture relatively easily, keeping the language purer. Once they reached South Asia, they established the caste system that kept the ruling class clearly distinct from the conquered population, once again preserving the authenticity of their culture and language with minimal indigenous influence.

In contrast, the Proto-Slavs and Proto-Balts ended up mixing heavily with local central and north-east European populations (especially represented by Y-haplogroups I2a1b and N1c1), which led to a similar degree of linguistic hybridization as the Centum language speakers experienced in Nordic countries by admixing with I1 and I2a2 populations.

Sennevini · Aug 1, 2014

Watch out here. You're making assumptions which are not correct. As linguist, I have to intervene;

1) *oin- /*oik-: that's about right. About the *H in front: this is the laryngeal *h1, which is likely to be just a glottal stop.
2) *dwo: probably a laryngeal at the end should be placed, because that is the dual suffix.
3) *trei- : I would add the plural ending, but the stem is ok.
4) *catur/*chatur: just no. it has to be from *kwetwor-/*kwetur-; Latin preserved the whole form, only the a in it is weird;
Mycenean Greek had qetor-; Greek has tettares; the t can only arise from *kw before *e/i; Germanic has a labial, which sometimes arose
from *kw.
5) *pempe/*penca: I would stick to *penkwe, because Latin has *quinque (from *pinque; for the *p>*qu change, see *percus > *quercus), and
Greek has pente (with *t < *kw). Of course, *kwe > *ca is a common Satem evolution.
6) *seks/*ses: about right; *ks in Satem would turn quickly to a form with *s.
7) No, it's *septm, with a vocalised *m. PIE could have complex nuclei.
8) I would also agree with the *o-vocalism.
9) I would say *newn, with a vocalised *n.
10) I would say *dekm, with a vocalised *m.

This isn't just a guessing game; this is an intricate form of research in which
exact laws of phonetic change operate as well as analogy of paradigms.

MOESAN · Aug 8, 2014

I agree concerning four and five (4,5): brittonic celtic had something like petwar/petwor for 4 and
the modern forms pum/pemp justify the penkw- reconstruction for 5

FrankN · Aug 8, 2014

If we assume Germanic "finger" to be related to "five", *penkwe looks like the appropriate root.
Under a similar assumption, ten should have been *decm to relate it to German "Zehen" (toes) (including the Germanic c>h first sound shift).

Maciamo · Aug 8, 2014

Sennevini said:
Watch out here. You're making assumptions which are not correct. As linguist, I have to intervene;

1) *oin- /*oik-: that's about right. About the *H in front: this is the laryngeal *h1, which is likely to be just a glottal stop.
2) *dwo: probably a laryngeal at the end should be placed, because that is the dual suffix.
3) *trei- : I would add the plural ending, but the stem is ok.
4) *catur/*chatur: just no. it has to be from *kwetwor-/*kwetur-; Latin preserved the whole form, only the a in it is weird;
Mycenean Greek had qetor-; Greek has tettares; the t can only arise from *kw before *e/i; Germanic has a labial, which sometimes arose
from *kw.
5) *pempe/*penca: I would stick to *penkwe, because Latin has *quinque (from *pinque; for the *p>*qu change, see *percus > *quercus), and
Greek has pente (with *t < *kw). Of course, *kwe > *ca is a common Satem evolution.
6) *seks/*ses: about right; *ks in Satem would turn quickly to a form with *s.
7) No, it's *septm, with a vocalised *m. PIE could have complex nuclei.
8) I would also agree with the *o-vocalism.
9) I would say *newn, with a vocalised *n.
10) I would say *dekm, with a vocalised *m.

This isn't just a guessing game; this is an intricate form of research in which
exact laws of phonetic change operate as well as analogy of paradigms.

So in short you agree with what you have learned at university, which is based essentially on the works of linguists giving more weight on Latin and Greek as sources of PIE. This is exactly what I am trying to contest in this thread.

Dalmat · Aug 8, 2014

In Croatian is

1-jedan(j=y)
2-dva
3-tri
4-četri(č=ch)
5-pet
6-šest(š=sh)
7-sedam
8-osam
9-devet
10-deset

Jedan is interesting, it could have a different source, if we split it into je dan, it would mean a day, or is day.
So it could mean that jedan comes from that singular basic cycle in nature, and is a merged word, so, aday, or yes+day-> yeday

arvistro · Aug 20, 2014

1. Oins in Centum/R1b ; Aik in Satem/R1a
1) *oin- /*oik-: that's about right. About the *H in front: this is the laryngeal *h1, which is likely to be just a glottal stop.

Hmm. Baltic and Slavic languages then opted for Centum here.
Oins = Viens, Vienas in Latvian/Lithuanian
Oins = Odin in Russian and Jedan/jedin in Slavic languages.

Mmiikkii · Nov 10, 2021

Maciamo said:
giving them a strong bias in favour of Latin and Greek sounding PIE reconstructions. For example the two w sounds in kʷetwor are found only in Latin (Quatuor) and in no other Indo-European languages. It's very odd that both Sihler and Beekes both reconstructed number four this way.

Wait, Maciamo, did you realize that in Latin languages there is the sound 'Q'
While, Germanic languges use 'P' derived letter 'F' for the number 4 (four, and vier in Dutch and German).

That doesn't look like the 'Q Celtic' and later 'P Celtic', just with IE languges instead of celtic?

And also it does look weird because Q celtic is older, as southern civilizations. P celtic happened after Q, same as Germanics came after Romans.
Could that indicate us that Germanic is most prone to the 'P sound' that to the 'Q' one?

Maciamo · Nov 11, 2021

Mmiikkii said:
Wait, Maciamo, did you realize that in Latin languages there is the sound 'Q'
While, Germanic languges use 'P' derived letter 'F' for the number 4 (four, and vier in Dutch and German).
That doesn't look like the 'Q Celtic' and later 'P Celtic', just with IE languges instead of celtic?
And also it does look weird because Q celtic is older, as southern civilizations. P celtic happened after Q, same as Germanics came after Romans.
Could that indicate us that Germanic is most prone to the 'P sound' that to the 'Q' one?

You bring up an interesting point. I agree that the Germanic F derives from the Celtic P. This is because the Proto-Germanic language developed quite late. Whereas the first Proto-Celtic R1b-P312 people reached Western Europe between 2500 and 1800 BCE (the likely speakers of Q-Celtic languages), R1b only reached Scandinavia from 1700 BCE onwards, and the Germanic language and ethnicity did not fully coalesce until the Nordic Iron Age, starting c. 500 BCE. What's more, the R1b tribes that moved into Scandinavia came from Central Europe, close to the region where P-Celtic developed (Hallstatt, La Tène). So in my view the R1b people who started the Nordic Bronze Age culture spoke a Proto-Celto-Germanic language in which the Q>P shift already happened, and the P>F took place later in Scandinavia.

MOESAN · Nov 11, 2021

The numbers, as trade tools, are not the best examples to take for recontruction, IMO, all the more when the concerned nouns are already close enough betwee languages.
Concerning Germanic, I doubt they were still in osmosis with Celts when the Gw-/P- shift occurs, spite I have no cristal bowl. At the opposite, they could have been under political/commercial influence of P-Celts.
Slavs have kind of °[tchet(i)ri] derived nouns responding to Qwattwor and apparently no Qw->P- shift, at first sight. The lost of the 'w' apppendice after the [K] sound is common in palatalising tongues.
I'm tempted to see for these F- numbers for '4' a loan to late Celtic because Germanics have otherwise and overwhelmingly Hw- evloutions for other *Qw- derived words and the Latin word doesn't seem here a P-> Qw- evolution as in Qwinqwe- (assimilation).
Taranis could say something here.

Revising the reconstruction of Proto-Indo-European numerals

Maciamo

Veteran member

Sennevini

Regular Member

MOESAN

Elite member

FrankN

Regular Member

Maciamo

Veteran member

Dalmat

Regular Member

arvistro

Elite member

Mmiikkii

Regular Member

Maciamo

Veteran member

MOESAN

Elite member