admixtools2 TUTORIAL for WINDOWS.

@eupator
i ran the f3 comparisons:

with the HO dataset, sorted from the most similar population to least:
Code:
pop1    pop2    pop3    est    se    z    p
Mbuti    Greece_BA_Mycenaean    Sardinian    0.0692    0.000538    129    0
Mbuti    Greece_BA_Mycenaean    Italian_North    0.0686    0.000534    129    0
Mbuti    Greece_BA_Mycenaean    Albanian    0.0683    0.000548    125    0
Mbuti    Greece_BA_Mycenaean    Greek    0.0682    0.00053    129    0
Mbuti    Greece_BA_Mycenaean    Greek.WGA    0.0681    0.000528    129    0
Mbuti    Greece_BA_Mycenaean    Bulgarian    0.068    0.000535    127    0
Mbuti    Greece_BA_Mycenaean    Cretan.DG    0.0677    0.000596    114    0
Mbuti    Greece_BA_Mycenaean    Greek_outlier.WGA    0.0677    0.000579    117    0
Mbuti    Greece_BA_Mycenaean    Sicilian    0.0675    0.00054    125    0
Mbuti    Greece_BA_Mycenaean    Armenian    0.0674    0.000527    128    0
Mbuti    Greece_BA_Mycenaean    Italian_South    0.0674    0.000536    126    0
Mbuti    Greece_BA_Mycenaean    Turkish    0.0667    0.000519    129    0
Code:
Mbuti    Greece_Minoan_Lassithi    Sardinian    0.0693    0.000489    142    0
Mbuti    Greece_Minoan_Lassithi    Italian_North    0.0684    0.000487    140    0
Mbuti    Greece_Minoan_Lassithi    Albanian    0.0682    0.000503    135    0
Mbuti    Greece_Minoan_Lassithi    Greek    0.068    0.000486    140    0
Mbuti    Greece_Minoan_Lassithi    Bulgarian    0.0678    0.000487    139    0
Mbuti    Greece_Minoan_Lassithi    Greek.WGA    0.0678    0.000481    141    0
Mbuti    Greece_Minoan_Lassithi    Cretan.DG    0.0677    0.00055    123    0
Mbuti    Greece_Minoan_Lassithi    Italian_South    0.0677    0.000495    137    0
Mbuti    Greece_Minoan_Lassithi    Greek_outlier.WGA    0.0675    0.000532    127    0
Mbuti    Greece_Minoan_Lassithi    Sicilian    0.0673    0.000497    136    0
Mbuti    Greece_Minoan_Lassithi    Armenian    0.0672    0.000485    139    0
Mbuti    Greece_Minoan_Lassithi    Turkish    0.0665    0.000474    140    0

with the 1240k dataset, to check if it's consistent when more SNPs are used:
Code:
Mbuti.SDG    Greece_Minoan_Lassithi    Sardinian.SDG    0.0739    0.000528    140    0
Mbuti.SDG    Greece_Minoan_Lassithi    Albanian.DG    0.0731    0.000608    120    0
Mbuti.SDG    Greece_Minoan_Lassithi    Greek.DG    0.073    0.000554    132    0
Mbuti.SDG    Greece_Minoan_Lassithi    Italian_North.SDG    0.073    0.000524    139    0
Mbuti.SDG    Greece_Minoan_Lassithi    Tuscan_1.DG    0.0728    0.000572    127    0
Mbuti.SDG    Greece_Minoan_Lassithi    Bulgarian.DG    0.0725    0.000543    133    0
Mbuti.SDG    Greece_Minoan_Lassithi    Cretan.DG    0.0724    0.000575    126    0
Mbuti.SDG    Greece_Minoan_Lassithi    Armenian.DG    0.0715    0.000555    129    0
Mbuti.SDG    Greece_Minoan_Lassithi    Turkish.DG    0.071    0.000582    122    0

code used to run it:

Code:
library(admixtools)
prefix = 'v50.0_HO_public'
my_f2_dir = 'myf2'


extract_f2(pops=c('Mbuti','Greece_Minoan_Lassithi','Greek','Cretan.DG','Greek.WGA','Greek_outlier.WGA','Albanian','Bulgarian','Italian_South','Italian_South','Sicilian','Sardinian','Italian_North','Armenian','Turkish'),maxmem=4000,prefix,my_f2_dir, fst=FALSE, afprod=FALSE,maxmiss = 1)


f2_blocks = f2_from_precomp(my_f2_dir)


pop1 = 'Mbuti'
pop2= 'Greece_Minoan_Lassithi'
pop3 = c('Greek','Cretan.DG','Greek.WGA','Greek_outlier.WGA','Albanian','Bulgarian','Italian_South','Italian_South','Sicilian','Sardinian','Italian_North','Armenian','Turkish')


qp3pop(f2_blocks, pop1, pop2, pop3)
 
Again, I don't know how to interpret the negative values but if that means closer related than the positive ones then Greek_2.DG is closest to Greek_1.DG. I'm not sure I'm reading this correctly but why would Greek_2.DG be closer to Greek_1.DG than Albanians.DG or Bulgarian.DG are? Perhaps some elevated West Asian ancestry in Greek_1.DG which qpAdm failed to capture or perhaps Greek_2.DG isn't fully Pontic Greek but has some minor mainland Greek Macedonian ancestry?
When I take into account Armenian.DG fst distances then there isn't anything irregular and Greek_1.DG is even further away from Armenian.DG than Bulgarian.DG is:

Genetic distance is calculated as a product / multiplication/ of multidimentional vectors. When the 2 vectors are perpendicular, their scalar product =0. If we take the 2 dimentional space - it will be a circle, all the dots in this cercle will have 0 distance. In 3D - it wiil be a sphere. The distance between dots insice the sphere will be negative. For multidimentional space it is similar, but more complicated.
Here is some explanation:

royalsocietypublishing.org/doi/10.1098/rstb.2020.0413
doi.org/10.1098/rstb.2020.0413
It is similar with F3 / F4 statistics.
 
The best part of admixtools2 for me is F3 /F4 statistics. Especially F3. I tested some other functions, but don't seems to be much useful for me.
qpgraph / QPWave is great too.. However depending on the selection of individuals / populations the results from QPwave could vary too much.
 
Thank you. If you get stuck somewhere, I will periodically check this post to see if I can help with anything. If you can't find the website links for the applications or the github page, PM me.

Also, the FST distance command, adjusted for pseudohaploid would look like this:

fst(prefix, pop1 = "Kurd", pop2 = c("Turkish", "Georgian", "Armenian_Hemsheni", "Armenian", "Ossetian", "Azeri"), adjust_pseudohaploid = FALSE)

How would one interpret the results?

Due to this error:
Error in f2_from_geno(f2_data, pops = union(pops, pops2), afprod = afprod, :
No blocks remain after discarding blocks with missing values! If you want to compute FST for pseudohaploid samples, set `adjust_pseudohaploid = FALSE`

I had to include the pseudohaploid fix, but not sure that the data makes sense now.

D06I7WO.png


Not sure why I got the error to being with, as far as I can tell I did not mix DG/SG with non DG/SG samples. (edit: not in the screenshot run, in other runs where Sardinian was missing still got the error)
 
How would one interpret the results?

Due to this error:


I had to include the pseudohaploid fix, but not sure that the data makes sense now.

Not sure why I got the error to being with, as far as I can tell I did not mix DG/SG with non DG/SG samples. (edit: not in the screenshot run, in other runs where Sardinian was missing still got the error)


You correctly added the command for the pseudohaploid, unfortunately there's not much else you can do. You can identify which samples require it (regardless of their naming) by going through their details in the .anno file.
 
You correctly added the command for the pseudohaploid, unfortunately there's not much else you can do. You can identify which samples require it (regardless of their naming) by going through their details in the .anno file.

Thanks for clearing that up.
Another question assuming you have played with f3. I managed to combine your post and this https://uqrmaie1.github.io/admixtools/articles/admixtools.html to patch up what I think is a f3 script:
prefix = "C:/Users/xyz/Desktop/fstats/ReichData3/SouthernArc_Public"
my_f2_dir = "C:/Users/xyz/Desktop/fstats/fdir"
library(admixtools)
library(tidyverse)


mypops = c('', '', '', '', '', '')
pop1 = ''
pop2 = c('', '', '', '')
pop3 = c('')




extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)
qp3pop(f2_blocks, pop1, pop2, pop3)


if (FALSE) {
qp3pop(f2_dir, pop1, pop2, pop3)
}

I tried reading the tutorial and even this paper https://academic.oup.com/genetics/article/202/4/1485/5930214 , yet I still have no clue how to interpret this:

7sDyPTa.png


I am guessing something like this?

ju09asz.png


But don't understand what is the cutoff between a viable and non viable model? Reading this https://reich.hms.harvard.edu/sites...iles/inline-files/preprints202003.0237.v1.pdf it seems its the Z score? If so what would be the cutoff for rejecting the model?
 
I haven't done much with f3 so far, just f2 and f4 for qpadm. You should keep at it!
 
.. I still have no clue how to interpret this:

p - the probability if there is a connection/ relation between pop2 and pop1 / pop3

se- standard error
est - The estimated [FONT=MathJax_Main]F3[/FONT]-statistics

In your case here, there is no relation between pop2 and 2 others. The probability = 0.
Z is too big and positive. If they were related - Z would be near 0 or negative.
 
p - the probability if there is a connection/ relation between pop2 and pop1 / pop3

se- standard error
est - The estimated [FONT=MathJax_Main]F3[/FONT]-statistics

In your case here, there is no relation between pop2 and 2 others. The probability = 0.
Z is too big and positive. If they were related - Z would be near 0 or negative.

Thanks for the neat explanation.
What would the cutoff for p and Z be? p >= .95 ? p>0.05?
Also, if I understood this correctly, pop2 is the focus of the test in relation to the other two pops?
 
Thanks for the neat explanation.
What would the cutoff for p and Z be? p >= .95 ? p>0.05?
Also, if I understood this correctly, pop2 is the focus of the test in relation to the other two pops?

They can use it both way:

1. To test if c is in the middle between a,b.


2. To test how far is c from a and b.

F3=<(c-a)(c-b)>
 
Thanks for the neat explanation.
What would the cutoff for p and Z be? p >= .95 ? p>0.05?
Also, if I understood this correctly, pop2 is the focus of the test in relation to the other two pops?

Actually: it is the Pop3 in the focus. We test if Pop3 is in the middle between pop1/pop2 or how far is pop3 on one side or another.

This is good to know:
F3(A,B,C) = F4(A,B, A,C)

https://uqrmaie1.github.io/admixtools/reference/qp3pop.html

In the description of the function they generaly tell if pop1 is A , pop2 =B and pop3 = C.

Also the results will show you the configuration. You may replace the position of the populations and see how this will change your results and sign positive / negative.
 
Actually: it is the Pop3 in the focus. We test if Pop3 is in the middle between pop1/pop2 or how far is pop3 on one side or another.

This is good to know:
F3(A,B,C) = F4(A,B, A,C)

https://uqrmaie1.github.io/admixtools/reference/qp3pop.html

In the description of the function they generaly tell if pop1 is A , pop2 =B and pop3 = C.

Also the results will show you the configuration. You may replace the position of the populations and see how this will change your results and sign positive / negative.

Thanks a lot for the insights mate, apreciate it.

Did you ever get to play with qpGraph? If so where would the output of this be saved?
qpg_results = qpgraph(f2_blocks, example_graph)qpg_results$score
plot_graph(qpg_results$edges)

To view something like this?

unnamed-chunk-40-1.png



Not sure if my script is even correctly set up to do that? But in the console it looks like I get an output successfully, just have no clue where it is? Does not seem to be in the dir... so maybe the script is wrong after all. I just added the first quoted part to my usual f2 script.
MUAXZz1.png


GMhbp4N.png
 
Did you ever get to play with qpGraph? If so where would the output of this be saved?


To view something like this?

unnamed-chunk-40-1.png



Not sure if my script is even correctly set up to do that? But in the console it looks like I get an output successfully, just have no clue where it is? Does not seem to be in the dir... so maybe the script is wrong after all. I just added the first quoted part to my usual f2 script.
MUAXZz1.png


GMhbp4N.png

The easiest way - you take a screenshot or save the results as png or pdf file.
However you may save it in another file - for example text file. I created my own scripts to save such data. I can share, but it's not perfect. It just work for me.

To save:

# Find the best result
min_score <- which.min(opt_results$score)




#min_score
which.min(opt_results$score[])
# opt_results$score[48]
opt_results$score[which.min(opt_results$score[])]
# opt_graph <- opt_results$graph[73] -> this graph is not useful
opt_graph1 <- opt_results$edges[which.min(opt_results$score[])]
data1 <- opt_graph1[1]


write.table(data1, file = My_output_filename,
row.names = F, col.names = T)


### Initially I was checking what is the opt result and adjust the number manually, then save it as text file. Then I did it in easiest way - save the opt results directly .. Then to re-open: here is an example:

## re-open results
setwd("D:/Documents/genetics/v50_1240_HO/geno/Results/QPGraph")
###### Reload from saved file


My_input_filename <- "CAR_graph.txt"


opt_results_graph_from_File <- read.table( My_input_filename , header = T)
# opt_results_graph_from_File <- read.table("Results/opt_results_opt_graph.txt", header = T)
# to print saved results ( graph + weight percentage per population)
plotly_graph(opt_results_graph_from_File)


plot_graph(opt_results_graph_from_File)


## there are 2 versions to plot - I like more the plotly version.
 
Hey eupator and bgtrak, I managed to figure out the plink, merge method, to use my raw file in qpADM analysis. But the results seem a bit nonsensical. I assume one of you might have done the same, and wanted some feedback.

hTDkvoI.png


wXeA0D4.png


outputs:
> rm(list = ls())>
> setwd("C:/plink")
>
> #make raw into plink
> system("plink --23file A01.txt --recode --out A01")
PLINK v1.90b6.26 64-bit (2 Apr 2022) www.cog-genomics.org/plink/1.9/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to A01.log.
Options in effect:
--23file A01.txt
--out A01
--recode


16219 MB RAM detected; reserving 8109 MB for main workspace.
--23file: A01-temporary.bed + A01-temporary.bim + A01-temporary.fam written.
Inferred sex: female.
627678 variants loaded from .bim file.
1 person (0 males, 1 female) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.981479.
627678 variants and 1 person pass filters and QC.
Note: No phenotypes present.
--recode ped to A01.ped + A01.map ... done.
[1] 0
>
> #make geno into plink
> packedancestrymap_to_plink("C:/Users/xxx/Desktop/fstats/ReichData3/SouthernArc_Public", "test",
+ pops =
+ c("ALB_Mdv", "ALB_PostMdv", "ALB_NChL", "ALB_N", "ALB_Çinamak_Anc",
+ "TUR_Marmara_Ilıpınar_Byz3", "GRC_Marathon_Rom", "UKR_Yamnaya", "HRV_Cetina_BA",
+ "GRC_Logkas_MBA2", "ALB_MBA", "BGR_KapitanAndreevo_IA", 'Mbuti.DG','GEO_Kotias',
+ 'RUS_Karelia_HG','ITA_Villabruna','MAR_Taforalt_EpiP','RUS_Tyumen_HG','IRN_Ganj_Dareh_N',
+ 'JOR_PPNB','TUR_Marmara_Menteşe_N','TJK_Sarazm_EN','RUS_DevilsCave_N')
+ )
ℹ SouthernArc_Public.geno has 5940 samples and 1233013 SNPs.
ℹ Reading data for 94 samples and 1233013 SNPs
ℹ Expected size of genotype data: 1065 MB
1233k SNPs read...
✔ 1233013 SNPs read in total
Writing: C:\plink\test.bed
Writing: C:\plink\test.bim
Writing: C:\plink\test.fam
>
> system("plink --bfile test --recode --out test")
PLINK v1.90b6.26 64-bit (2 Apr 2022) www.cog-genomics.org/plink/1.9/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to test.log.
Options in effect:
--bfile test
--out test
--recode


16219 MB RAM detected; reserving 8109 MB for main workspace.
1233013 variants loaded from .bim file.
94 people (0 males, 0 females, 94 ambiguous) loaded from .fam.
Ambiguous sex IDs written to test.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 94 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.424371.
1233013 variants and 94 people pass filters and QC.
Note: No phenotypes present.
--recode ped to test.ped + test.map ... done.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
[1] 0
>
> system("plink --file A01 --make-bed")
PLINK v1.90b6.26 64-bit (2 Apr 2022) www.cog-genomics.org/plink/1.9/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink.log.
Options in effect:
--file A01
--make-bed


16219 MB RAM detected; reserving 8109 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (627678 variants, 1 person).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
627678 variants loaded from .bim file.
1 person (0 males, 1 female) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.981479.
627678 variants and 1 person pass filters and QC.
Note: No phenotypes present.
--make-bed to plink.bed + plink.bim + plink.fam ... done.
[1] 0
>
> system(paste0("plink --file test --merge A01 ", "--recode --merge-equal-pos --out mergeda01"))
PLINK v1.90b6.26 64-bit (2 Apr 2022) www.cog-genomics.org/plink/1.9/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to mergeda01.log.
Options in effect:
--file test
--merge A01
--merge-equal-pos
--out mergeda01
--recode


16219 MB RAM detected; reserving 8109 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (1233013 variants, 94 people).
--file: mergeda01-temporary.bed + mergeda01-temporary.bim +
mergeda01-temporary.fam written.
94 people loaded from mergeda01-temporary.fam.
1 person to be merged from A01.ped.
Of these, 1 is new, while 0 are present in the base dataset.
1233013 markers loaded from mergeda01-temporary.bim.
627678 markers to be merged from A01.map.
Of these, 460666 are new, while 167012 are present in the base dataset.
Performing single-pass merge (95 people, 1691831 variants).
Merged fileset written to mergeda01.bed + mergeda01.bim + mergeda01.fam .
1691831 variants loaded from .bim file.
95 people (0 males, 1 female, 94 ambiguous) loaded from .fam.
Ambiguous sex IDs written to mergeda01.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 95 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.307694.
1691831 variants and 95 people pass filters and QC.
Note: No phenotypes present.
--recode ped to mergeda01.ped + mergeda01.map ... done.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
[1] 0


2RBJRj7.png


Considering how much time I spent trying to figure this out, its quite disappointing.
I initially thought me having to use dnakitstudio to convert ftdna raw into 23andme raw, was the issue, but I tried it with someone elses 23andme raw and the qpADM run was as nonsensical.
 

The prefix needs to be the path linking to the .geno file that you are using 1240K/or HO (and the name you've given it, if you have) w/o the .geno extension at the end. Be careful and specific, a lot of people get temp stuck here.

Eupator, I didn't get what you mean here. Do you mean I should delete the .geno extension from the name of the file, despite the warning message that its contents might change if I do so?
 
I assume this is the cause of the following error message:

Error in anygeno_to_aftable(pref, inds = inds, pops = pops, format = format, :
Genotype files not found!
 
Just omit the ".geno" in your path, don't delete anything.
 
Thanks a lot. Still not working, as I get the same error message...Must be smth else.

It can't find your geno file, your path is wrong.
 

This thread has been viewed 30949 times.

Back
Top