In the last few weeks David at Eurogenes figured out that D-stats can be used to produce very accurate admixture proportions for Europeans(see here). If we didn't have so many ancient European genomes this wouldn't be possible. We know who (most of)the ancestors of Europeans are and at what proportions. We don't know who the ancestors of Middle Easterners are, because our only ancient West Asian genomes are from Mesolithic/Upper Paleolithic Georgia(CHG), Neolithic Turkey/Anatolia(EEF), and Bronze age Armenia.

However, using D-stats and very simple math, it's possible to get an idea who the non-CHG/EEF/Exotic ancestors of Middle Easterners were. First you need at least one assumed ancestor of a Middle Eastern population(I'll call them A). Then you multiple A's score in a D-stat by a percentage from 0-99%. After that, you subtract the result of that calculation by the Middle Eastern population's score(I'll call this A in the same D-stat, then divide that calculation by a percentage from 0-99%. The result of that calculation will be a ghost ancestor of a Middle Eastern population(I'll call it C).

You can get an unlimited amount of possible ghost ancestors. There's no mathematical method I know of, which can prove one way is more true than the other. You just have to use common sense. If the D-stat results of a ghost ancestor make absolutely no sense, then you can scratch it off. If you have an example of a real ancestor, usually you'll be able to break down the ghost ancestor to a few possibilities.

I've already used this method on many Middle Eastern populations. It looks like all Middle Easterners are mostly EEF Cousin+Other(CHG, Africa, Europe, East Asia, South Asia), but that they also have ancestry from ancient Middle Easterners who were equally related to EEF and CHG. CHG ancestry looks like it's mostly only important to Northern West Asia. The more outgroups for D-stats you use the more accurate the results will be. I'll use this thread to post ghost ancestor results for Middle Easterners.

If we used this method before Yamnaya and EHG genomes were sampled, we would have been able to get a good idea what Yamnaya was genetically. I've tested it. When I use Middle Neolithic Europeans as an assumed ancestor of Northern Europeans, their non-MN side is very similar to Yamnaya and EHG.


Here's an example with Mozabite. I'll assume they have African ancestry.
A=D(Chimp, Nigeria)(Mbuti, EEF)=0.1021
B=D(Chimp, Mozabite)(Mbuti, EEF)=0.3816
C=Score of Mozabite's Eurasian ancestors.

C=D(Chimp, Mozabite_Eurasian_side)(Mbuti, EEF)=0.4046

I don't know how much African admixture Mozabite has. So, I don't just model them as 20%(0.20) African, I model them as everything from 99-1% African. Mozabite's Eurasian ancestor's results only make sense, when Mozabite is modeled as 30-20% African. There's no official way to prove Mozabite's Eurasian ancestor's results are impossible if Mozabite was 95% African, but you just have to use common sense and ignore the results.

Here's the type of results Mozabite's Eurasian ancestor gets, if Mozabite is modelled as 40% African.
D(Chimp, Mozabite_Eurasian_side)(Mbuti, Han)=0.41
D(Chimp, Mozabite_Eurasian_side)(Mbuti, EEF)=0.50

That's unrealistic. The results get crazier and crazier the more you raise African ancestry percentages. There's no way any human being to ever exist was so close to EEF and Han_Chinese at the same time. A EEF/Chinese hyprid wouldn't get results even close to that.

So, using this method, I would theorize Mozabites are about 20% African, and that their Eurasian side is very similar to EEF. This theory makes a lot of sense. However, their Eurasian side isn't exactly like EEF, it looks more like a cousin of EEF or EEF with heavy admixture from unknown ancient West Asians.

Here's another example with Georgians. I'll assume Georgians have CHG admixture.
A=D(Chimp, CHG)(Mbuti, EEF)=0.3746
B=D(Chimp, Georgian)(Mbuti, EEF)=0.3915
C=Score of Georgian's Non-CHG side.

C=D(Chimp, Non-CHG_side)(Mbuti, EEF)=0.403

When I model Georgians as 40% CHG, their non-CHG, is similar to EEF but not exactly the same as EEF. When I model Georgians as 60%+ CHG, their non-CHG ancestor gets crazy results. So, using this method Georgians, I would theorize Georgians are 40-50% CHG and 50-60% EEF-cousin.

To prove this method works, I'll use it on ancient/modern Europeans and an ancient European as an assumed ancestor.

Srubnaya. I'll assume Yamnaya is an ancestor of Srubnaya.
A=D(Chimp, Yamnaya)(Mbuti, EEF)=0.386
B=D(Chimp, Srubnaya)(Mbuti, EEF)=0.3967
C=Score of Srubnaya's non-Yamnaya ancestors.

C=D(Chimp, Non-Yamnya_Side)(Mbuti, EEF)=0.4395

When I model Srubnaya as 80% Yamnaya, their non-Yamnaya 20% scores exactly like Middle Neolithic Europeans. Of all the 99 ghost ancestors for Srunbnaya, all of them made no sense or were clearly a mixture of Yamnaya and Middle Neolithic European. If we had no Neolithic European genomes and only had Srunbnaya and Yamnaya, using this method we would easily discover Srunbnaya is a Yamnaya+Sardinian-like mixture.

Here's another example using modern Lithuanians. I'll assume Corded Ware Germany is an ancestor of Lithuanians.
A=D(Chimp, Corded Ware_Germany)(Mbuti, WHG)=0.4043
B=D(Chimp, Lithuania)(Mbuti, WHG)=0.4123
C=Score of Lithuanian's non-Corded Ware ancestors.

C=D(Chimp, Non-Corded Ware_Side)(Mbuti, WHG)=0.4443

A2=D(Chimp, Corded Ware_Germany)(Mbuti, EEF)=0.3958
B2=D(Chimp, Lithuania)(Mbuti, EEF)=0.3995
C2=Score of Lithuanian's non-Corded Ware ancestors.

C2=D(Chimp, Non-Corded Ware_Side)(Mbuti, EEF)=0.4143

Using this method Lithuanians come out mostly Corded Ware, with significant admixture from people who were heavy in WHG and EEF.