G25 g25 VS qpadm/admixtools2 comparison.

i want to be able to use it too

but it seems to be very complicated
 
Eupator can you test this three-way model for I15707 (it is a post-mdv Alb sample with extremely low Slavic). His biggest component is consistently picked up in modern Albs G25 and I am curious if it holds up in qpadm.

RSzpURr.png


Suggested model:
component 1: I18832 or I10379, these two profiles are very interchangeable.
component 2: I16253 or I14690, I16253 is preferred but I do not see it in the 1240k list
component 3: Art039

PdIkk6Z.png


Used the same model in qpadm, p-value seems really strong, is that a 0.939? But the se values are bad. This is what I used for right, because the original from your tutorial would give me a errors as some of the populations referenced were not there(in the geno file I am assuming).

"right = c('Mbuti.DG', 'Ethiopia_4500BP.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_Karelia_HG.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_CordedWare', 'Armenia_LBA.SG', 'ONG.SG')
> mypops = c('Mbuti.DG', 'Ethiopia_4500BP.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_Karelia_HG.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_CordedWare', 'Armenia_LBA.SG', 'ONG.SG','Greek_1.DG','Spain_Greek_oAegean','Polish.DG','Armenian.DG')"
 
From what I understand, anything between 0.05-1 is considered a good p-value for biological relevance.

However, other metrics are important, such as standard errors, which you want to keep low. Check out this study as a de facto guide:


Also check out this post:

"While the overall distributions of P-values differ between optimal and nonoptimal qpAdm models, we note that for individual replicates the most optimal model is not necessarily assigned the highest P-value. We find that the P-value associated with the best model (sources 5 and 9) produces the highest P-value in only 48% of cases (Supplementary Table S5), when the standard reference set is used (13, 12, 10, 7, and 0). In frequentist methods such as qpAdm, P-values below the nominal significance level are judged wrong enough to be rejected, but P-values do not represent probabilities of models being correct. As Figure 2 shows, qpAdm is fairly conservative in rejecting models. For example, the model which posits populations 4 and 9 as sources may be considered wrong because population 4 is more closely related to source population 0 than it is to the target population 14. Still, P-values under this model are almost uniformly distributed (Figure 2B) and for a given data set the P-value for this model could easily be larger than the P-value for the correct model (Figure 2A). In contrast, models that diverge strongly from the truth are always rejected, as when populations 11 and 9 are used as sources (Figure 2F). Therefore, in cases where multiple models are assigned plausible P-values (i.e., P ≥ 0.05), we caution that P-value ranking (i.e., selecting the model that is assigned the highest P-value) should not be used to identify the best model. Methods for distinguishing between multiple models will be discussed further in the section on comparing admixture models."
 
In my model, both p-value and se more than just pass, the scores are high in both.
 
One way model.

YrNkhhx.png


Alb post-mdv I15707, is basically same as MKD_IA_I10379 and Hun_La_Tene_I18832_E-V13, this is a super good se value.
My man, first of all kudos for running qpadm.
In what methodology did you use to label Macedonia IA and Hungary La Tenne as Bassarabi?
This is the first time I see qpadm used for a test with one sample on the left. Did you get inspired somewhere?



Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and
to calculate the relative proportion of ancestry that can be ascribed to each source population in the model
qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and to calculate the relative proportion of ancestry that can be ascribed to each source population in the model.


The Adm in qpAdm refers to admixture. But even assuming one can use it for one population on the left as you did and the underlying math works. It would have little comparative power. As in, if two populations both pass the test with different passing p values, you would have no way to tell which one is the actual ancestral population. (Hint: higher p is meaningless outside the experiment)


Basically if it works on one pop, which I am not sure, it wouldn't say
Alb post-mdv I15707, is basically same as MKD_IA_I10379 and Hun_La_Tene_I18832_E-V13, this is a super good se value.
But rather:
MKD_IA_I10379 and Hun_La_Tene_I18832_E-V13 *could* be ancestral to Alb post-mdv I15707.

And the se values for one pop better be good. Tested it, and didnt see a single bad one, even in failing models.
 
Last edited:
In my model, both p-value and se more than just pass, the scores are high in both.
PdIkk6Z.png


Used the same model in qpadm, p-value seems really strong, is that a 0.939? But the se values are bad. This is what I used for right, because the original from your tutorial would give me a errors as some of the populations referenced were not there(in the geno file I am assuming).

"right = c('Mbuti.DG', 'Ethiopia_4500BP.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_Karelia_HG.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_CordedWare', 'Armenia_LBA.SG', 'ONG.SG')
> mypops = c('Mbuti.DG', 'Ethiopia_4500BP.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_Karelia_HG.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_CordedWare', 'Armenia_LBA.SG', 'ONG.SG','Greek_1.DG','Spain_Greek_oAegean','Polish.DG','Armenian.DG')"

Almost 40% se is definitively not good. A test would pass with <5%
 
can i ask any of you per private messages to model ethnicities with qpAdm ? if any of you are ok with it please quote me and tell me
 
I am away so not able to test your requests, but some of the other friends here have a good grasp and are familiar with the sampling more than me, I am still stuck at southern arc samples, haven't worked with the new ones.
 
My man, first of all kudos for running qpadm.
In what methodology did you use to label Macedonia IA and Hungary La Tenne as Bassarabi?
This is the first time I see qpadm used for a test with one sample on the left. Did you get inspired somewhere?







The Adm in qpAdm refers to admixture. But even assuming one can use it for one population on the left as you did and the underlying math works. It would have little comparative power. As in, if two populations both pass the test with different passing p values, you would have no way to tell which one is the actual ancestral population. (Hint: higher p is meaningless outside the experiment)


Basically if it works on one pop, which I am not sure, it wouldn't say

But rather:


And the se values for one pop better be good. Tested it, and didnt see a single bad one, even in failing models.

I asked eupator to run a model for me and all I got was a thumbs up. So I rolled my sleeves followed the instruction from the main thread and I got it running. Suprisingly it didn't take long, but running models and modifying sources/left command was time consuming. Those two were first models.

Bassarabi is based on G25 autosomal run. I also checked in qpad, in a general model they all pass. In my current model which is much more precise MJ12 and Hungary(E-V13) I18832 pass, MKD I10379 passes with MJ12 but not I18832. MJ12 is the bridge that works with both. I think the SNP coverage is not the same. for MKD I10379, either way, post-Kukes samples can all be run successfully with I18832. There will be a time when new samples will make better component.

I think I am at good spot, and will not bother running any more models or trying new ideas. Example of how far I reached.

Ea1C4n8.png

FQKBjPE.png

W0U7YGa.png


Middle one is a slight fail but shared it anyway.
 
Last edited:
^^models look statistically robust. Pvals are within biological relevance, and standard errors are relatively low. They're viable models.

Good job!

The Nassius sample(R6764) is IA profile, Roman period from eastern Serbia, and it is E-V13. I got other models but will post them in the right thread. I regret not having set up 1240K+HO to run the other Alb modern samples. That means I will have to do some of the renaming again n the ind file.
 
Last edited:
7ZjpY3d.png


^^models look statistically robust. Pvals are within biological relevance, and standard errors are relatively low. They're viable models.

Good job!

Guess the biological relevance here proves some sort of Fallmerayer adaptation for North Italians then?
If you can't see what is wrong with that model, read my next reply to Paleo.
 
Last edited:
I asked eupator to run a model for me and all I got was a thumbs up. So I rolled my sleeves followed the instruction from the main thread and I got it running. Suprisingly it didn't take long, but running models and modifying sources/left command was time consuming. Those two were first models.

Bassarabi is based on G25 autosomal run. I also checked in qpad, in a general model they all pass. In my current model which is much more precise MJ12 and Hungary(E-V13) I18832 pass, MKD I10379 passes with MJ12 but not I18832. MJ12 is the bridge that works with both. I think the SNP coverage is not the same. for MKD I10379, either way, post-Kukes samples can all be run successfully with I18832. There will be a time when new samples will make better component.

I think I am at good spot, and will not bother running any more models or trying new ideas. Example of how far I reached.

Ea1C4n8.png

FQKBjPE.png

W0U7YGa.png


Middle one is a slight fail but shared it anyway.

That's much more like it.
It is good practice to post your tail when you do runs so other people can replicate them, as they play a big role in the results.
The problem here is that using asynchronous populations can lead to mishaps like the one I replied to Jovialis. In principle you can model all Europeans as CHG, EHG, and EEF. So if you use EHG and CHG in the component populations you could potentially add Sardinians, and woila, you proved half of Europe descends from Sardinians. I think you get the logic.

Point here is while the Late Antiquity Naisus sample could be viable for all we know, but this is not the way to test it. Where would you find Satsurblia like unadmixed profile in Late Antiquity Balkans? But more importantly, how can you label a Baltic_BA aggregate of samples as Slavic? You ought to do more research, sadly the very good thread at Anthro on Slavic ethnogenesis is no longer, learned a lot from people like ph2ter and others, and to date the earliest most reliable sample for Slavs in general, but especially in the Balkans in av2. And I think some other samples not in the Reich Dataset from Prague culture or something.

I will publish some very interesting finds I noticed recently, have been sitting on them and doing some research on the free time, as they are quite surprising, and counterintuitive, given what little we know from the Migration Period in the southern Balkans.

So the bigger picture is, this models are only tools that can show you what likely did not happen, and what "could" have happened with a confidence level, in this case 95%. Then it is your job to make sure:
1. Use contemporary populations that had access to each other to procreate, meaning no Baltic BA, or Georgia Sats.. in late antiquity Balkans.
2. Make sure your tail is properly designed for the specific test and does not overfit. There is a paper I have shared many times on this.
3. Try to use samples sequenced with the same tech(?), try not to mix noUDG, .SG, .DG etc together. Sample quality plays a huge role in the numbers you see on these models.


aYkaQ5V.png


As you see what you posted, does not mean what you think it did. But I guess that was obvious with my reply to Jovialis. (Sadly my properly labeled Reich dataset is the one that has the .HO samples for more modern references, so I could not run the Georgia Sats.. test, but I guess the points above make it obvious why that is redundant.

I saw in another thread you could not get to model some(or all) of the Albanian mdv samples? Maybe its your tail that is the issue, or maybe something else. Generally they tend to be 2/3 Alb_BA_IA 1/3 something Eastern / Anatolian.

Which made me remember. Do not just relabel the Reich Dataset labels as you please. Its disingenuous. Maybe add the years for clarity, or in cases like AV2 describe what makes them an outlier based on the facts presented in the paper they were published. But Baltic_BA_Slavic? lol

Ps. The tail I use here is based on the latest Danubian Limes paper, the labels are the same, but tweaked for my specific experiments.

right = c("OldAfrica", "Steppe_BA", "EHG", "Iron_Gates_HG", "Anatolia_N", "Ukraine_EBA_Yamnaya", "Sudan_EarlyChristian", "Kazakhstan_EarlySarmatian","Czech_EarlySlav_660-770",
"Iran_N", "Greece_Minoan","Croatia_MBA_Cetina", "Czech_IA_Hallstatt", "Bulgaria_IA",
"Steppe_IA","SoutheastTurkey_MLBA", "Baltic_BA")

Bonus~ example of asynchronous model not dissimilar to what you are doing:
CTXqI4q.png
 
Last edited:
7ZjpY3d.png




Guess the biological relevance here proves some sort of Fallmerayer adaptation for North Italians then?
If you can't see what is wrong with that model, read my next reply to Paleo.
I was replying to post 30 about Albanians. I wasn't actually following the thread, just looking at those figures for that specific post.
 
That's much more like it.
It is good practice to post your tail when you do runs so other people can replicate them, as they play a big role in the results.
The problem here is that using asynchronous populations can lead to mishaps like the one I replied to Jovialis. In principle you can model all Europeans as CHG, EHG, and EEF. So if you use EHG and CHG in the component populations you could potentially add Sardinians, and woila, you proved half of Europe descends from Sardinians. I think you get the logic.

The real question is how would you model north Italians as Nausius and Slavic, your parameters are stretched to produce comical results.

8W6CuJO.png


Those parameters also model you as Illyrian plus Slavic, while my models are always within cline. This does show that qpdam is nowhere near full proof. Your Illyrian model is forced.


Point here is while the Late Antiquity Naisus sample could be viable for all we know, but this is not the way to test it. Where would you find Satsurblia like unadmixed profile in Late Antiquity Balkans? But more importantly, how can you label a Baltic_BA aggregate of samples as Slavic? You ought to do more research, sadly the very good thread at Anthro on Slavic ethnogenesis is no longer, learned a lot from people like ph2ter and others, and to date the earliest most reliable sample for Slavs in general, but especially in the Balkans in av2. And I think some other samples not in the Reich Dataset from Prague culture or something.

I did not claim Estonia_BA is Slavic., show me where I stated that. It has the Slavic drift, this was well known and discussed in Anthrogenica, where your favorite, Brumziu tried hard to argue against it, against some top-notch northern Slav who knew his stuff.
And don't be condescending, take a look at your model, you stretched out the weights to produce unrealistic results.
My Slavic mdv values are better than yours. You know there are some unmixed Slavic samples among the Vikings samples right? Did you know that mr. research?


I will publish some very interesting finds I noticed recently, have been sitting on them and doing some research on the free time, as they are quite surprising, and counterintuitive, given what little we know from the Migration Period in the southern Balkans.

So the bigger picture is, this models are only tools that can show you what likely did not happen, and what "could" have happened with a confidence level, in this case 95%. Then it is your job to make sure:
1. Use contemporary populations that had access to each other to procreate, meaning no Baltic BA, or Georgia Sats.. in late antiquity Balkans.
2. Make sure your tail is properly designed for the specific test and does not overfit. There is a paper I have shared many times on this.
3. Try to use samples sequenced with the same tech(?), try not to mix noUDG, .SG, .DG etc together. Sample quality plays a huge role in the numbers you see on these models.

Just like G25, qpdam can be used as metal detector, and nothing else. The only way to truly prove it, is to have relevant samples every 500 years, where you go by the best fitting model. Such ideal dataset will not be available for the meantime.

The issue with qpdam is that it has hick ups when there is shared drift. In reality, the Alb parent population was largely IA but still had acquired some MENA admixture. And the additional MENA that came from Byzantine period had acquired some steppe admixture. These two proxies are problematic, qpdam wants to work with cleaner coordinates, that's why EST_BA passes, while Slavic_mdv fails, not because EST_BA is the real Slavic, but qpdam likes proxies with distances/distinction. My models are not non-sense. I kept my parameters free of bias.
For Qpdam to work the way you want it. You will need to find the parent Alb population mixed with MENA before Salvic admixture occured. With the current samples, I say good luck with that. And it's frankly never going to happen, because some MENA admixture occured before Slavs, and later it came with Balkan Slavs. You will not get the recontruction/model you desire, because of how the vriables were formed.

As you see what you posted, does not mean what you think it did. But I guess that was obvious with my reply to Jovialis. (Sadly my properly labeled Reich dataset is the one that has the .HO samples for more modern references, so I could not run the Georgia Sats.. test, but I guess the points above make it obvious why that is redundant.

I saw in another thread you could not get to model some(or all) of the Albanian mdv samples? Maybe its your tail that is the issue, or maybe something else. Generally they tend to be 2/3 Alb_BA_IA 1/3 something Eastern / Anatolian.

You should try modeling kenete and shtike seperately, as they are not the same.

Which made me remember. Do not just relabel the Reich Dataset labels as you please. Its disingenuous. Maybe add the years for clarity, or in cases like AV2 describe what makes them an outlier based on the facts presented in the paper they were published. But Baltic_BA_Slavic? lol

Baltic_BA was used as a Slavic drift proxy, you do not get that? You think my master plan was to show low Slavic admixture?
Salvic drift using EST_BA average in G25
lp698LS.png


right = c("OldAfrica", "Steppe_BA", "EHG", "Iron_Gates_HG", "Anatolia_N", "Ukraine_EBA_Yamnaya", "Sudan_EarlyChristian", "Kazakhstan_EarlySarmatian","Czech_EarlySlav_660-770",
"Iran_N", "Greece_Minoan","Croatia_MBA_Cetina", "Czech_IA_Hallstatt", "Bulgaria_IA",
"Steppe_IA","SoutheastTurkey_MLBA", "Baltic_BA")

Try mine.

right = c('Cameroon_SMA', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1', 'Russia_West_Siberia_HG', 'Serbia_IronGates_Mesolithic', 'Karitiana.DG', 'Papuan.DG', 'Armenia_LBA.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_LN.SG', 'Cyprus_C', 'Russia_Boisman_MN', 'Romania_C_Bodrogkeresztur', 'Croatia_EIA', 'Netherlands_EIA', 'Russia_Samara_EBA_Yamnaya', 'Czech_CordedWare', 'Lithuania_EMN_Narva', 'Turkey_Arslantepe_LateC', 'Israel_C', 'Iraq_PPNA', 'Lebanon_ERoman.SG', 'ONG.SG')
 
Last edited:
7ZjpY3d.png




Guess the biological relevance here proves some sort of Fallmerayer adaptation for North Italians then?
If you can't see what is wrong with that model, read my next reply to Paleo.
Like PCA, qpAdm also has to be confirmed by other metrics, as well as archeology and history.
 
Like PCA, qpAdm also has to be confirmed by other metrics, as well as archeology and history.

I posted the target and variables in a PCA, does Nasius + Slavic look legit for northern Italian? With a stellar fit too.
8W6CuJO.png
 
Bonus~ example of asynchronous model not dissimilar to what you are doing:
CTXqI4q.png

I ran the same model in my tail.
HUcn83M.png


Your tail is out of whack. If it's what the Serbian team produced, it is junk, 30+ full-time researchers were devoted to that. The so called scientists.
I based mine through trial and error of two long days.
 

This thread has been viewed 5182 times.

Back
Top