Admixtools admixtools2 TUTORIAL for WINDOWS.

How do you fix this? Unable to run any admixtools command, get this error. Have it on both computers I tried installing rstudio.

> fst(prefix, pop1 = "Armenian.DG", pop2 = c("French.DG", "English.DG", "Spanish.DG", "Russian.DG", "Polish.DG", "Sardinian.DG", "Hungarian.DG", "Basque.DG", "Greek_1.DG", "Estonian.DG", "Finnish.DG", "Orcadian.DG", "Bulgarian.DG", "Italian_North.DG", "Norwegian.DG", "Icelandic.DG", "Cretan.DG"))
ℹ Reading precomputed data for 18 populations...
Error in read_f2(dir, pops, pops2, type = type, remove_na = remove_na, :
block_lengths file not found. Please run extract_f2() again, as the file format has recently changed.
 
You can also use the following command to calculate FST distances to a target population.

Here, I am going to do just that by comparing various moderns to the Israel_Natufian_published samples:

fst(prefix, pop1 = "Israel_Natufian_published", pop2 = c("Egyptian", "Greek", "Lebanese", "Jordanian", "Saudi", "Somali"))

The results look like this (lower is closer, taking the standard error into account):


A tibble: 6 × 4
pop1 pop2 est se

1 Israel_Natufian_published Egyptian 0.0715 0.00702
2 Israel_Natufian_published Greek 0.0874 0.00729
3 Israel_Natufian_published Jordanian 0.0789 0.00720
4 Israel_Natufian_published Lebanese 0.0735 0.00728
5 Israel_Natufian_published Saudi 0.0760 0.00755
6 Israel_Natufian_published Somali 0.109 0.00701

It's good practice not to mix .DG/.SG samples with non .DG/.SG, because there is inherent bias towards the first, if you absolutely need to do this, you can adjust the command by adding: adjust_pseudohaploid = FALSE, at the end of the command.
I was able to figure out how to do it this way:
Code:
# Define the prefix for your genotype files
prefix <- "D:/Bioinformatics/01_Admixtools_Dataset/v54.1.p1_HO_Jovialis_Plink/Skourtanioti_2023/v54.1.p1_HO_Jovialis"


# Define the directory where you want to save the FST results
my_f2_dir <- "D:/Bioinformatics/my_f2_dir_Jovialis"


# Define the main sample and the new list of populations to compare
single_pop <- "Jovialis"
populations <- c("R111_Italy_Imperial_C6.SG", "R113_Italy_Imperial_C6.SG", "R125_Italy_Imperial_C6.SG",
                 "R131_Italy_Imperial_C6.SG", "R1544_Italy_Imperial_C6.SG", "R1549_Italy_Imperial_C6.SG",
                 "R436_Italy_Imperial_C6.SG", "R47_Italy_Imperial_C6.SG", "R49_Italy_Imperial_C6.SG",
                 "R51_Italy_Imperial_C6.SG", "R835_Italy_Imperial_C6.SG", "R836_Italy_Imperial_C6.SG")


# Specify the output file path
output_file <- paste0(my_f2_dir, "/fst_results.tsv")


# Load necessary libraries
library(admixtools)
library(tidyverse)


# Initialize a tibble to store FST results
fst_results <- tibble(population = character(), fst = numeric(), se = numeric())


# Loop over each new population and calculate FST
for (pop in populations) {
  fst_out <- fst(prefix, single_pop, pop, adjust_pseudohaploid = FALSE)
  fst_results <- fst_results %>%
    add_row(population = pop, fst = fst_out$est, se = fst_out$se)
}


# Write the FST results to a TSV file
write_tsv(fst_results, output_file)


# Identifying the closest population
closest_pop <- fst_results %>%
  arrange(fst) %>%
  slice(1)


print(closest_pop)

Result:
Code:
> # Write the FST results to a TSV file
> write_tsv(fst_results, output_file)
>
> # Identifying the closest population
> closest_pop <- fst_results %>%
+     arrange(fst) %>%
+     slice(1)
>
> print(closest_pop)
# A tibble: 1 × 3
population                  fst      se
<chr>                     <dbl>   <dbl>
1 R835_Italy_Imperial_C6.SG 0.491 0.00345
>
 
I was able to figure out how to do it this way:
Code:
# Define the prefix for your genotype files
prefix <- "D:/Bioinformatics/01_Admixtools_Dataset/v54.1.p1_HO_Jovialis_Plink/Skourtanioti_2023/v54.1.p1_HO_Jovialis"


# Define the directory where you want to save the FST results
my_f2_dir <- "D:/Bioinformatics/my_f2_dir_Jovialis"


# Define the main sample and the new list of populations to compare
single_pop <- "Jovialis"
populations <- c("R111_Italy_Imperial_C6.SG", "R113_Italy_Imperial_C6.SG", "R125_Italy_Imperial_C6.SG",
                 "R131_Italy_Imperial_C6.SG", "R1544_Italy_Imperial_C6.SG", "R1549_Italy_Imperial_C6.SG",
                 "R436_Italy_Imperial_C6.SG", "R47_Italy_Imperial_C6.SG", "R49_Italy_Imperial_C6.SG",
                 "R51_Italy_Imperial_C6.SG", "R835_Italy_Imperial_C6.SG", "R836_Italy_Imperial_C6.SG")


# Specify the output file path
output_file <- paste0(my_f2_dir, "/fst_results.tsv")


# Load necessary libraries
library(admixtools)
library(tidyverse)


# Initialize a tibble to store FST results
fst_results <- tibble(population = character(), fst = numeric(), se = numeric())


# Loop over each new population and calculate FST
for (pop in populations) {
  fst_out <- fst(prefix, single_pop, pop, adjust_pseudohaploid = FALSE)
  fst_results <- fst_results %>%
    add_row(population = pop, fst = fst_out$est, se = fst_out$se)
}


# Write the FST results to a TSV file
write_tsv(fst_results, output_file)


# Identifying the closest population
closest_pop <- fst_results %>%
  arrange(fst) %>%
  slice(1)


print(closest_pop)

Result:
Code:
> # Write the FST results to a TSV file
> write_tsv(fst_results, output_file)
>
> # Identifying the closest population
> closest_pop <- fst_results %>%
+     arrange(fst) %>%
+     slice(1)
>
> print(closest_pop)
# A tibble: 1 × 3
population                  fst      se
<chr>                     <dbl>   <dbl>
1 R835_Italy_Imperial_C6.SG 0.491 0.00345
>
Are you sure ADMIXTOOLS 2 doesn't work with the latest R versions? For example, R-4.3.3.?
 
I haven’t tried this yet. Those of you that have used this does it produce similar results that one would get in g25? I viewed one other thread on it and on the samples used it was a very similar breakdown if I’m remembering correctly.
 
I haven’t tried this yet. Those of you that have used this does it produce similar results that one would get in g25? I viewed one other thread on it and on the samples used it was a very similar breakdown if I’m remembering correctly.
Check out this thread:


In addition to @eupator there's a few other members that could help you out.

There's times when it does replicate qpAdm, and times when it doesn't. Frankly, you should endeavor to use admixtools for optimal analysis. It is the suite of tools used by academics for peer-reviewed studies. Once you get it set up, it becomes second nature. I use AI to help me figure it out.
 
Check out this thread:


In addition to @eupator there's a few other members that could help you out.

There's times when it does replicate qpAdm, and times when it doesn't. Frankly, you should endeavor to use admixtools for optimal analysis. It is the suite of tools used by academics for peer-reviewed studies. Once you get it set up, it becomes second nature. I use AI to help me figure it out.
Are there the same samples I can play with like in G25 such as England_C_EBA:I2609, Ireland_Kilteasheen_AngloSaxon_EMedieval_Norman:KIL032, Italy_IA_Republic.SG:R1021, Jordan_EBA:I1705 etc.?
 
Are there the same samples I can play with like in G25 such as England_C_EBA:I2609, Ireland_Kilteasheen_AngloSaxon_EMedieval_Norman:KIL032, Italy_IA_Republic.SG:R1021, Jordan_EBA:I1705 etc.?

The Reich lab tends to have a nearly all the samples from studies leading up to the time it is published:

You would have to download the tar file from here:

You will also need a program like Visual Code Studio (a free software) to view to Geno file, which lists the names of the samples. It is also where you can modify the names to create groups.
 
I currently have the 1240K+HO version which I converted to PLINK format.

If you want to add your own DNA, you first have to make sure the raw data is in 23andme format, and aligned to HG19 format. That is the format that AADR uses.

You then have to use PLINK to convert the AADR files from eigenstrat format to PLINK format. Then convert your 23andme raw data to PLINK format, and do a one sided merge in PLINK, making sure to only include SNPs native to AADR.

It is a bit of a complex process, I used Chatgpt 4.0 to help me out.
 
First thing I would do is just set up Admixtools2 using the tutorial provided in this thread, and take it one step at a time. It is a bit overwhelming at first, but once it is set up, you'll be happy you did it. I'd space it out over a few days.

If you have a FASTQ file of 30X WGS, that would be ideal. Services like Nebula provide that.
 
First thing I would do is just set up Admixtools2 using the tutorial provided in this thread, and take it one step at a time. It is a bit overwhelming at first, but once it is set up, you'll be happy you did it. I'd space it out over a few days.

If you have a FASTQ file of 30X WGS, that would be ideal. Services like Nebula provide that.
Thanks for the info! Definitely will spread it out. Just by looking at it I could tell in split second it wasn't user friendly, especially if one has no coding/computer programming skills.
 
Thanks for the info! Definitely will spread it out. Just by looking at it I could tell in split second it wasn't user friendly, especially if one has no coding/computer programming skills.
That's precisely where Chatgpt 4.0 can help out.

Copy and paste the text from here, and ask it to give you a simplified step-by-step. It will even write the code out if you prompt it to:

 
Does admixtools2 work with R 4.3.3?
 
No, you need 4.2.3
For some reason it showhorns me into 4.3.3. I use windows 11 btw

1711902206383.png

1711902234363.png

1711902258777.png
 
Download and install R-Studio for Windows. https://www.rstudio.com/products/rstudio/

Download and install R-tools 4.2 for Windows. https://cran.r-project.org/bin/windows/Rtools/

After you complete the installation, run it (R-Studio) and install admixtools2 and dependencies by following the instructions on the INSTALLATION part of this page: admixtools2 on github.

https://uqrmaie1.github.io/admixtools/index.html

Finally, go to the Reich dataset page and download the sample files.
1240+HO dataset has manier moderns but less SNPs (500K), compared to 1240K. You can find the description of each sample in the .anno files. You can rename the samples (for example I renamed Kotias to CHG, Iran_Ganj_Dareh to Iran_N, etc) in the .ind file for your convenience w/o changing the order/position of each sample, but do not mix together .DG and .SG files to non .DG/.SG ones, the former are shotgun sequences.

In the command prompt of your R-Studio, you need the following commands (each time you run the program from scratch):


prefix = "C:/Users/eupator/Downloads/eupator_qpAdm_files/HO/v50.0_1240K_public"
my_f2_dir = "C:/Users/eupator/Downloads/eupator_qpAdm_files/my_f2_dir_eupator"
library(admixtools)
library(tidyverse)

The prefix needs to be the path linking to the .geno file that you are using 1240K/or HO (and the name you've given it, if you have) w/o the .geno extension at the end. Be careful and specific, a lot of people get temp stuck here.
The my_f2_dir needs to be a directory where the generated f2 stats are going to be stored each time, I named it my_f2_dir_eupator.

Now you are ready to do some runs, you require

a) a left list, which will be comprised of your target on top and its components below (4-5).
b) a right list, which is going to be your source populations (before the split) that the target is going to be compared against, there is an ongoing debate of how the right list should be built, some suggest using a few very distant populations, other insist on using recent source populations, you can use either methods or both (I use a mix). Some people suggest using no more than 15 pops, you can probably use up to 30.

Example:

I want to run the Lazaridis' modern Greek references, as a 3-way component model, in order to check the value of their ancient Greek (IA) part (empuries2), their Slavic part (I will use Polish.DG for this reason) and their West Asian part (I will use Armenian.DG for this reason).

I renamed all 3 empuries_2 samples (I8215, I8208, I8205) to "Greek_Emporion".

First, I will set my target and left with the following commands:


target = c('Greek')
left= c('Greek_Emporion','Polish.DG','Armenian.DG')

And for my right list, I use a modified (with more recents) Lazaridis et al. (2017) right list as a prototype (Mbuti.DG always on top):

right = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG')

Before I run the model, I need to generate the f2 stats for the total of both my left, right and target.

mypops = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG','Greek','Greek_Emporion','Polish.DG','Armenian.DG')
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)

Now I can run the model using the following commands:

results = qpadm(prefix, left, right, target, allsnps = TRUE)
results$weights
results$popdrop

The model is a success. It has a good p-value (above 5%) of 0.0892 and low std. errors (around or below 5%).

> results$weights
# A tibble: 3 × 5
target left weight se z

1 Greek Greek_Emporion 0.416 0.0612 6.79
2 Greek Polish.DG 0.371 0.0391 9.48
3 Greek Armenian.DG 0.213 0.0458 4.66
> results$popdrop
# A tibble: 7 × 14
pat wt dof chisq p f4rank Greek_Emporion Polish.DG

1 000 0 18 26.5 8.92e- 2 2 0.416 0.371
2 001 1 19 48.6 2.07e- 4 1 0.630 0.370
3 010 1 19 93.0 9.83e-12 1 1.01 NA
4 100 1 19 76.6 7.17e- 9 1 NA 0.598
5 011 2 20 95.8 7.08e-12 0 1 NA
6 101 2 20 131. 2.74e-18 0 NA 1
7 110 2 20 313. 2.00e-54 0 NA NA
# … with 6 more variables: Armenian.DG , feasible ,
# best , dofdiff , chisqdiff , p_nested


The Greek reference can be successfully modeled as 41.6% Greek Emporion, 37.1% Polish.DG and 21.3% Armenian.DG.
@qh777

Follow the link in eupator's op
 

This thread has been viewed 33542 times.

Back
Top