admixtools2 TUTORIAL for WINDOWS.

eupator

destroyer of delusions
Messages
507
Reaction score
279
Points
63
Ethnic group
Rhōmaiōs (Rumelia + Anatolia)

Download and install R-Studio for Windows.
https://www.rstudio.com/products/rstudio/

Download and install R-tools 4.2 for Windows. https://cran.r-project.org/bin/windows/Rtools/

After you complete the installation, run it (R-Studio) and install admixtools2 and dependencies by following the instructions on the INSTALLATION part of this page: admixtools2 on github.

https://uqrmaie1.github.io/admixtools/index.html

Finally, go to the Reich dataset page and download the sample files. https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/

1240+HO dataset has manier moderns but less SNPs (500K), compared to 1240K. You can find the description of each sample in the .anno files. You can rename the samples (for example I renamed Kotias to CHG, Iran_Ganj_Dareh to Iran_N, etc) in the .ind file for your convenience w/o changing the order/position of each sample, but do not mix together .DG and .SG files to non .DG/.SG ones, the former are shotgun sequences.

In the command prompt of your R-Studio, you need the following commands (each time you run the program from scratch):

prefix = "C:/Users/eupator/Downloads/eupator_qpAdm_files/HO/v50.0_1240K_public"
my_f2_dir = "C:/Users/eupator/Downloads/eupator_qpAdm_files/my_f2_dir_eupator"
library(admixtools)
library(tidyverse)

The prefix needs to be the path linking to the .geno file that you are using 1240K/or HO (and the name you've given it, if you have) w/o the .geno extension at the end. Be careful and specific, a lot of people get temp stuck here.
The my_f2_dir needs to be a directory where the generated f2 stats are going to be stored each time, I named it my_f2_dir_eupator.

Now you are ready to do some runs, you require

a) a left list, which will be comprised of your target on top and its components below (4-5).
b) a right list, which is going to be your source populations (before the split) that the target is going to be compared against, there is an ongoing debate of how the right list should be built, some suggest using a few very distant populations, other insist on using recent source populations, you can use either methods or both (I use a mix). Some people suggest using no more than 15 pops, you can probably use up to 30.

Example:

I want to run the Lazaridis' modern Greek references, as a 3-way component model, in order to check the value of their ancient Greek (IA) part (empuries2), their Slavic part (I will use Polish.DG for this reason) and their West Asian part (I will use Armenian.DG for this reason).

I renamed all 3 empuries_2 samples (I8215, I8208, I8205) to "Greek_Emporion".

First, I will set my target and left with the following commands:

target = c('Greek')
left= c('Greek_Emporion','Polish.DG','Armenian.DG')

And for my right list, I use a modified (with more recents) Lazaridis et al. (2017) right list as a prototype (Mbuti.DG always on top):

right = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG')

Before I run the model, I need to generate the f2 stats for the total of both my left, right and target.

mypops = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG','Greek','Greek_Emporion','Polish.DG','Armenian.DG')
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)

Now I can run the model using the following commands:

results = qpadm(prefix, left, right, target, allsnps = TRUE)
results$weights
results$popdrop

The model is a success. It has a good p-value (above 5%) of 0.0892 and low std. errors (around or below 5%).

> results$weights
# A tibble: 3 × 5
target left weight se z

1 Greek Greek_Emporion 0.416 0.0612 6.79
2 Greek Polish.DG 0.371 0.0391 9.48
3 Greek Armenian.DG 0.213 0.0458 4.66
> results$popdrop
# A tibble: 7 × 14
pat wt dof chisq p f4rank Greek_Emporion Polish.DG

1 000 0 18 26.5 8.92e- 2 2 0.416 0.371
2 001 1 19 48.6 2.07e- 4 1 0.630 0.370
3 010 1 19 93.0 9.83e-12 1 1.01 NA
4 100 1 19 76.6 7.17e- 9 1 NA 0.598
5 011 2 20 95.8 7.08e-12 0 1 NA
6 101 2 20 131. 2.74e-18 0 NA 1
7 110 2 20 313. 2.00e-54 0 NA NA
# … with 6 more variables: Armenian.DG , feasible ,
# best , dofdiff , chisqdiff , p_nested


The Greek reference can be successfully modeled as 41.6% Greek Emporion, 37.1% Polish.DG and 21.3% Armenian.DG.
 
Last edited:
You can also use the following command to calculate FST distances to a target population.

Here, I am going to do just that by comparing various moderns to the Israel_Natufian_published samples:

fst(prefix, pop1 = "Israel_Natufian_published", pop2 = c("Egyptian", "Greek", "Lebanese", "Jordanian", "Saudi", "Somali"))

The results look like this (lower is closer, taking the standard error into account):

A tibble: 6 × 4
pop1 pop2 est se

1 Israel_Natufian_published Egyptian 0.0715 0.00702
2 Israel_Natufian_published Greek 0.0874 0.00729
3 Israel_Natufian_published Jordanian 0.0789 0.00720
4 Israel_Natufian_published Lebanese 0.0735 0.00728
5 Israel_Natufian_published Saudi 0.0760 0.00755
6 Israel_Natufian_published Somali 0.109 0.00701

It's good practice not to mix .DG/.SG samples with non .DG/.SG, because there is inherent bias towards the first, if you absolutely need to do this, you can adjust the command by adding: adjust_pseudohaploid = FALSE, at the end of the command.
 
Thanks a lot for this. Will try once I get some time. Already started by getting linux earlier in the yaer, but never got to it, hopefully this is the right nudge. Well done for sharing.
 
Thanks a lot for this. Will try once I get some time. Already started by getting linux earlier in the yaer, but never got to it, hopefully this is the right nudge. Well done for sharing.


Thank you. If you get stuck somewhere, I will periodically check this post to see if I can help with anything. If you can't find the website links for the applications or the github page, PM me.

Also, the FST distance command, adjusted for pseudohaploid would look like this:

fst(prefix, pop1 = "Kurd", pop2 = c("Turkish", "Georgian", "Armenian_Hemsheni", "Armenian", "Ossetian", "Azeri"), adjust_pseudohaploid = FALSE)
 
Thank you, great work.
 
Eupator, any idea why I get this error?

prefix = "C:/Users/xyz/Desktop/fstats/ReichData/v50.0_1240K_public"
my_f2_dir = "C:/Users/xyz/Desktop/fstats/fdir"
library(admixtools)
library(tidyverse)


target = c('S_Albanian-1.DG')
left= c('I4175','AV2','R437.SG')


right = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG')


mypops = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG','S_Albanian-1.DG','I4175','AV2','R437.SG')
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)



results = qpadm(prefix, left, right, target, allsnps = TRUE)
results$weights
results$popdrop



Error in anygeno_to_aftable(pref, inds = inds, pops = pops, format = format, :
Genotype files not found!

Google didn't help much.

Edit: While troubleshooting it seems the highlighted part is where the trouble arises.
 
Your directory is set wrong.

If you didn't rename the .geno filename, your directory should be set as:

prefix = "C:/Users/xyz/Desktop/fstats/ReichData/v50.0_1240K_public/v50.0_1240K_public

It needs to target the .geno file (w/o the extension).

Edit: also, if you rename files, you need to do so in the 2nd column, don't touch the 1st column in the .ind file.
 
The naming in your target = c('Albanian.DG') and elsewhere, should be the one in the 2nd column, not the 1st. It looks like you used the latter with S_Albanian-1.DG. You will have to rename that position in the 2nd column from Albanian.DG to S_Albanian-1.DG if you want to separate it from the average (there's 2 Albanian.DG files in there, if I remember).

Maybe, you get the error because you are referencing the 1st column in your .ind file.

Check it out, also.
 
Your directory is set wrong.

If you didn't rename the .geno filename, your directory should be set as:

prefix = "C:/Users/xyz/Desktop/fstats/ReichData/v50.0_1240K_public/v50.0_1240K_public

It needs to target the .geno file (w/o the extension).

Edit: also, if you rename files, you need to do so in the 2nd column, don't touch the 1st column in the .ind file.

Edit: So this should fix it?

prefix = "C:/Users/xyz/Desktop/fstats/ReichData/v50.0_1240K_public/v50.0_1240K_public"
my_f2_dir = "C:/Users/xyz/Desktop/fstats/fdir"
library(admixtools)
library(tidyverse)


target = c('Albanian.DG')
left= c('Croatia_LateC_EBA_Vucedol','Hungary_Avar_5_daughter.or.mother.AV1','Italy_IA_Republic_oEasternMediterranean.SG')


right = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG')


mypops = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG','S_Albanian-1.DG','Croatia_LateC_EBA_Vucedol','Hungary_Avar_5_daughter.or.mother.AV1','Italy_IA_Republic_oEasternMediterranean.SG')
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)


results = qpadm(prefix, left, right, target, allsnps = TRUE)
results$weights
results$popdrop


I am doing something wrong for sure.
 
Edit: Trying second fix, will update here.

Check my edit above this post of yours. I think you used the wrong referencing (1st column rather than the 2nd), I should have stated this explicitly in the OP.
 
I am doing something wrong for sure.

What error are you getting now?

You need to make sure there's no spelling mistakes, or capitalization, or missing apostrophes/commas.

Also, why don't you rename the AV samples into AV1 and AV2 in their respective 2nd column and just use that like 'AV1' and/or 'AV2'.
 
You have installed R-tools, right?
 
Can you write the full directory the .geno file is in, including the file also, copy paste it from the explorer line.
 
I am seeing now in your mypops list the Albanian reference is still listed as S_Albanian-1.DG and not Albanian.DG like you have it in your target line.
 
Same error I initially got at the same spot, whether I used 1st column names or second column names. Making me think maybe I am missing some libraries?


r7rGqDs.png


R9tkWhH.png




prefix = "C:/Users/xyz/Desktop/fstats/ReichData/v50.0_1240K_public/v50.0_1240K_public"
my_f2_dir = "C:/Users/xyz/Desktop/fstats/fdir"
library(admixtools)
library(tidyverse)


target = c('Albanian.DG')
left= c('Croatia_LateC_EBA_Vucedol','Hungary_Avar_5_daughter.or.mother.AV1','Italy_IA_Republic_oEasternMediterranean.SG')


right = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG')


mypops = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG','Albanian.DG','Croatia_LateC_EBA_Vucedol', 'Hungary_Avar_5_daughter.or.mother.AV1', 'Italy_IA_Republic_oEasternMediterranean.SG')
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)


results = qpadm(prefix, left, right, target, allsnps = TRUE)
results$weights
results$popdrop

Ps: Tried to send you a pm to thank you for your patience, but your inbox is full.
 
You have 2 spaces that need to be removed in the Avar and Italy_IA samples in your left command line, but that might be the eupedia text editor messing up and not displaying properly.

Other than that, I don't see anything, I ran your commands and they work fine:

Reading allele frequencies from packedancestrymap files...
ℹ v50.0_1240K_public.geno has 10391 samples and 1233013 SNPs
ℹ Calculating allele frequencies from 115 samples in 25 populations
ℹ Expected size of allele frequency data: 385 MB
1233k SNPs read...
✔ 1233013 SNPs read in total
! 1150639 SNPs remain after filtering. 1037203 are polymorphic.
ℹ Allele frequency matrix for 1150639 SNPs and 25 populations is 313 MB
ℹ Computing pairwise f2 for all SNPs and population pairs requires 15647 MB RAM without splitting
ℹ Splitting into 2 chunks of 13 populations and up to 8000 MB (3 chunk pairs)
pop pair block 3 out of 3
ℹ Data written to C:\Users\eptr\Documents\v50.0_1240K_public\my_f2_dir_eptr\
> f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)
ℹ Reading precomputed data for 25 populations...
ℹ Reading ap data for pair 325 out of 325...
Warning message:
In read_f2(dir, pops, pops2, type = type, remove_na = remove_na, :
Discarding 1 block(s) due to missing values!
Discarded block(s): 538
> results = qpadm(prefix, left, right, target, allsnps = TRUE)
ℹ Reading metadata...
ℹ Computing block lengths for 1150639 SNPs...
ℹ Computing 60 f4-statistics for block 713 out of 713...
ℹ "allsnps = TRUE" uses different SNPs for each f4-statistic
Number of SNPs used for each f4-statistic:

...



ℹ Computing admixture weights...
ℹ Computing standard errors...
ℹ Computing number of admixture waves...


warning: solve(): system is singular (rcond: 2.13974e-17); attempting approx solution
> results$weights
# A tibble: 3 × 5
target left weight se z
<chr> <chr> <dbl> <dbl> <dbl>
1 Albanian.DG Croatia_LateC_EBA_Vucedol -0.617 0.671 -0.919
2 Albanian.DG Hungary_Avar_5_daughter.or.mother.AV1 0.551 0.362 1.52
3 Albanian.DG Italy_IA_Republic_oEasternMediterranea… 1.07 0.334 3.19
> results$popdrop
# A tibble: 7 × 14
pat wt dof chisq p f4rank Croatia_LateC_EBA_Vucedol
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 000 0 18 25.6 1.09e- 1 2 -0.617
2 001 1 19 48.1 2.45e- 4 1 1.75
3 010 1 19 63.1 1.23e- 6 1 -0.0398
4 100 1 19 43.0 1.30e- 3 1 NA
5 011 2 20 82.0 1.82e- 9 0 1
6 101 2 20 192. 3.63e-30 0 NA
7 110 2 20 70.9 1.28e- 7 0 NA
# … with 7 more variables: Hungary_Avar_5_daughter.or.mother.AV1 <dbl>,
# Italy_IA_Republic_oEasternMediterranean.SG <dbl>, feasible <lgl>,
# best <lgl>, dofdiff <dbl>, chisqdiff <dbl>, p_nested <dbl>

The model doesn't seem to work with Vucedol in it. I'll run it w/o and see what happens, but first, I want you to copy paste to me the path of .geno file, for example mine is

C:\Users\eptr\Documents\v50.0_1240K_public\v50.0_1240k_public.geno
 
You have 2 spaces that need to be removed in the Avar and Italy_IA samples in your left command line, but that might be the eupedia text editor messing up and not displaying properly.

Other than that, I don't see anything, I ran your commands and they work fine:



The model doesn't seem to work with Vucedol in it. I'll run it w/o and see what happens, but first, I want you to copy paste to me the path of .geno file, for example mine is

C:\Users\eptr\Documents\v50.0_1240K_public\v50.0_1240k_public.geno


"C:\Users\xyz\Desktop\fstats\ReichData\v50.0_1240K_public.geno"



I did install the packages as well as described here: https://github.com/uqrmaie1/admixtools
Really will give it another go with a fresh brain later.

Edit: Ye, I will definitively give it a look tomorrow. Nothing seems to fix this. Not even your most recent comment, and I would hate to keep tiring you. I am sure its some " ' " somewhere, lol.
 
OK, we found the error, this should be your command:

prefix = "C:/Users/xyz/Desktop/fstats/ReichData/v50.0_1240K _public"
my_f2_dir = "C:/Users/xyz/Desktop/fstats/fdir"
library(admixtools)
library(tidyverse)
 
It should work now, check the post above.
 
I ran your model but the p-values are very low, the reference samples are not very good for those Albanian samples. I've noticed in the past that AV1 and AV2 are hard to make them work.

This is the model with the best p-value, but it is still very low p-value=0.0271, way below the 5% treshold.

# A tibble: 2 × 5
target left weight se z
<chr> <chr> <dbl> <dbl> <dbl>
1 Albanian.DG Hungary_Avar_5_daughter.or.mother.AV1 0.257 0.0656 3.92
2 Albanian.DG Italy_IA_Republic_oEasternMediterranea… 0.743 0.0656 11.3
> results$popdrop
# A tibble: 3 × 13
pat wt dof chisq p f4rank Hungary_Avar_5_daughter.or.mot…
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 00 0 19 32.5 2.71e- 2 1 0.257
2 01 1 20 182. 3.21e-28 0 1
3 10 1 20 64.0 1.66e- 6 0 NA
# … with 6 more variables:
# Italy_IA_Republic_oEasternMediterranean.SG <dbl>, feasible <lgl>,
# best <lgl>, dofdiff <dbl>, chisqdiff <dbl>, p_nested <dbl>
 

This thread has been viewed 30955 times.

Back
Top