• Don't want to see ads? Install an adblocker like uBlock Origin or use a Europe-based privacy-friendly browser like Vivaldi or Mullvad.

Admixtools admixtools2 TUTORIAL for WINDOWS.

Just in case anyone wants to know, you convert eigenstrat format files to plink by creating a parfile in plink like so (This is also a note for me so I can remember; re-learning this stuff after a year can take a while):

1. Open Ubantu and navigate to where you have it mount, and where you send your files. Ming is on a D drive called "UbuntuJovialisHome":

Code:
cd /mnt/d/UbuntuJovialisHome/

2. Create the parfile, I use nano:

Code:
nano parfile

3. In the nano file editor type this:

Code:
genotypename:    /mnt/d/UbuntuJovialisHome/v62.0_HO_public.geno
snpname:         /mnt/d/UbuntuJovialisHome/v62.0_HO_public.snp
indivname:       /mnt/d/UbuntuJovialisHome/v62.0_HO_public.ind
outputformat:    PACKEDPED
genotypeoutname: /mnt/d/UbuntuJovialisHome/data.bed
snpoutname:      /mnt/d/UbuntuJovialisHome/data.bim
indivoutname:    /mnt/d/UbuntuJovialisHome/data.fam
familynames:     NO

4. Finally, execute the parfile to produce the BED, BIM, and FAM:

Code:
convertf -p /mnt/d/UbuntuJovialisHome/parfile

After a while it will finish processing and generate in the destination you are sending it to.

Optional. Here's a quick way to verify in Ubuntu when it is done processing (or you can just navigate there):

Code:
ls -lh /mnt/d/UbuntuJovialisHome/v62.0_HO_public.*

Finally have the process documented from using my HG19-aligned 23andme txt file produced last year from when I processed it from FASTQ


Code:
# Step 1: Convert the 23andMe file to PLINK binary format.
plink --23file /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked_23andMe_V3.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked

# Step 2a: Extract SNPs from the Jovialis dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --write-snplist --out /mnt/d/UbuntuJovialisHome/jovialis_snp_list

# Step 2b: Extract SNPs from the AADR (v62.0_HO_public) dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --write-snplist --allow-no-sex --out /mnt/d/UbuntuJovialisHome/v62_snp_list

# Step 3: Find common SNPs between the two datasets (Jovialis and AADR).
comm -12 <(sort /mnt/d/UbuntuJovialisHome/jovialis_snp_list.snplist) <(sort /mnt/d/UbuntuJovialisHome/v62_snp_list.snplist) > /mnt/d/UbuntuJovialisHome/common_snps.txt

# Step 4: Filter the Jovialis dataset to keep only SNPs present in both datasets (common SNPs).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --extract /mnt/d/UbuntuJovialisHome/common_snps.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_common_snps

# Step 5: Attempt an initial merge and identify problematic SNPs (multiallelic or inconsistent strand).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62_common_snps --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged

# Step 6: Flip problematic SNPs in the Jovialis dataset (fix strand inconsistencies).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --flip /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_flipped

# Step 7: Exclude remaining problematic SNPs from the Jovialis dataset.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_flipped --exclude /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered

# Step 8: Filter the Jovialis dataset to keep only SNPs present in AADR.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_PLINK_binary --extract /mnt/d/UbuntuJovialisHome/v62.0_HO_public.bim --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered_for_AADR

# Step 9: Perform the final merge, ensuring all SNPs from AADR are kept and only SNPs matching AADR from Jovialis are merged.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_filtered_cleaned --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final

# Step 10a: Check SNP frequency in the final merged dataset to verify all SNPs from AADR were retained.
plink --bfile /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final --freq --out snp_check

# Step 10b: Check SNP frequency in the original AADR dataset for comparison.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --freq --out aadr_check

GAybZqw.png
 
Last edited:
This is is how you merge with PLINK:

My raw dna file in this example is eupator.txt in the 23ame format, if it's not in that format, convert it with dna kit studio or similar.

Code:
plink --23file eupator.txt --make-bed --out eupator

That will create the bed filed eupator.bed. Use an editor to rename your sample in the file to your liking.

Then use your converted .bed dataset to merge. In this example it's v54_HO_public.

Code:
plink --allow-no-sex --bfile v54_HO_public --bmerge eupator --out v54_HO_public_eupator

The merged file is 54_HO_public_merged.

If you get strand inconsistency problems you can solve them, as they appear, with the following:

Code:
plink --allow-no-sex --bfile eupator --flip v54_HO_public_eupator-merge.missnp --make-bed --out eupator_flipped

plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_flipped --make-bed --out v54_HO_public_eupator2

plink --allow-no-sex --bfile eupator_flipped --exclude v54_HO_public_eupator2-merge.missnp --make-bed --out eupator_filtered

plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_filtered --make-bed --out v54_HO_public_final

Your new merged dataset is v54_HO_public_final, you can use it straight away with at2 as at2 reads plink files.
I had a go mimicking the same process using PLINK i.e. merging my individual sample file with the AADR dataset (although I used an earlier version – v44.3_HO_public – which I already had in plink format) and although it merged (after the flipping and excluding etc), the genotyping rate was very compromised from the original non-merged files. I tried a number of variations of the merging procedures and was able to get an improved result using the --write-snplist trick but the quality was still noticeably lower.

If you generate a --missing report on your non-merged v54_HO_public file and then on your v54_HO_public_final file you'll probably see what I mean.
 
Finally have the process documented from using my HG19-aligned 23andme txt file produced last year from when I processed it from FASTQ


Code:
# Step 1: Convert the 23andMe file to PLINK binary format.
plink --23file /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked_23andMe_V3.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked

# Step 2a: Extract SNPs from the Jovialis dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --write-snplist --out /mnt/d/UbuntuJovialisHome/jovialis_snp_list

# Step 2b: Extract SNPs from the AADR (v62.0_HO_public) dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --write-snplist --out /mnt/d/UbuntuJovialisHome/v62_snp_list

# Step 3: Find common SNPs between the two datasets (Jovialis and AADR).
comm -12 <(sort /mnt/d/UbuntuJovialisHome/jovialis_snp_list.snplist) <(sort /mnt/d/UbuntuJovialisHome/v62_snp_list.snplist) > /mnt/d/UbuntuJovialisHome/common_snps.txt

# Step 4a: Filter the Jovialis dataset to keep only SNPs present in both datasets (common SNPs).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --extract /mnt/d/UbuntuJovialisHome/common_snps.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_common_snps

# Step 4b: Filter the AADR dataset to keep only SNPs present in both datasets (common SNPs).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --extract /mnt/d/UbuntuJovialisHome/common_snps.txt --make-bed --out /mnt/d/UbuntuJovialisHome/v62_common_snps

# Step 5: Attempt an initial merge and identify problematic SNPs (multiallelic or inconsistent strand).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62_common_snps --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged

# Step 6: Flip problematic SNPs in the Jovialis dataset (fix strand inconsistencies).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --flip /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_flipped

# Step 7: Exclude remaining problematic SNPs from the Jovialis dataset.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_flipped --exclude /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered

# Step 8: Filter the Jovialis dataset to keep only SNPs present in AADR.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_PLINK_binary --extract /mnt/d/UbuntuJovialisHome/v62.0_HO_public.bim --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered_for_AADR

# Step 9: Perform the final merge, ensuring all SNPs from AADR are kept and only SNPs matching AADR from Jovialis are merged.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_filtered_cleaned --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final

# Step 10a: Check SNP frequency in the final merged dataset to verify all SNPs from AADR were retained.
plink --bfile /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final --freq --out snp_check

# Step 10b: Check SNP frequency in the original AADR dataset for comparison.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --freq --out aadr_check

GAybZqw.png
Any reason you aren't using Eigensoft MERGEIT to merge your individual file with the AADR data set (in Eigenstrat format)? It's much simpler and by default it does all the strand flipping and exclusions where necessary so you don't need to go through as many steps.
Once you've created the parfile listing the files to merge it's just ./mergeit - p <parfile> and it's done.
 
Any reason you aren't using Eigensoft MERGEIT to merge your individual file with the AADR data set (in Eigenstrat format)? It's much simpler and by default it does all the strand flipping and exclusions where necessary so you don't need to go through as many steps.
Once you've created the parfile listing the files to merge it's just ./mergeit - p <parfile> and it's done.
I actually did try it initially last year before I did it in Plink, but there was something that I am currently forgetting about the process that wasn't quite right.
 
I've started including set.seed(12345) in my script, and has massively reduced inconsistency in my results. Though it is not 100% perfect. Is this a common practice in qpadm? Because it seems to be conducive to consistent results, and it replicates results I was able to reproduce before I implemented it.


Code:
# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"
my_f2_dir <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\my_f2_dir_v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)
library(parallel)

# Set a random seed for reproducibility
set.seed(12345)

# Define target population
target = c('Jovialis')

# Define right populations
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG',
           'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG',
           'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Generate f2 stats only once to avoid redundant computations
mypops = c(right, target)
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)

# Define the list of optional populations (only LBA samples)
optional_pops = c('Greece_Aidonia_LBA.AG', 'Greece_Crete_Aposelemis_LBA.AG',
                  'Greece_GlykaNera_LBA.AG', 'Greece_Koukounaries_LBA.AG',
                  'Greece_Lazarides_LBA.AG', 'Greece_Tiryns_LBA.AG',
                  'Greece_Crete_Chania_LBA.AG', 'Greece_PalaceOfNestor_BA.AG',
                  'Greece_Attica_Kolikrepi_BA.AG', 'Greece_Crete_Zakros_BA.AG',
                  'Greece_Kastrouli_BA.AG', 'Greece_Salamis_BA.AG',
                  'Greece_Galatas_BA.AG', 'Greece_Peristeria_BA.AG',
                  'Greece_Crete_Armenoi_BA.AG', 'Greece_Crete_Odigitria_BA.AG',
                  'Greece_Crete_Krousonas_LBA.SG')

# Define the sets
sets = list(
    list(must_use = c('Italy_PianSultano_BA.SG'))
)

# Initialize list to store all results
all_results_list = list()

# Parallelization setup
num_cores <- detectCores() - 1  # Reserve 1 core for system tasks
cl <- makeCluster(num_cores)

# Export necessary objects to the cluster
clusterEvalQ(cl, library(admixtools))
clusterExport(cl, list("prefix", "right", "target", "qpadm", "optional_pops", "sets"))

# Function to create combinations and run qpadm in parallel
run_qpadm_for_set_parallel <- function(set_index, sets, optional_pops, prefix, right, target, cl) {
    set_must_use <- sets[[set_index]]$must_use
    
    results_list <- parLapply(cl, optional_pops, function(pop) {
        left_set = c(set_must_use, pop)
        
        # Run qpadm model and handle errors gracefully
        tryCatch({
            results = qpadm(prefix, left_set, right, target, allsnps = TRUE)
            return(list(left = left_set, weights = results$weights, popdrop = results$popdrop))
        }, error = function(e) {
            return(list(left = left_set, error = e$message))
        })
    })
    return(results_list)
}

# Iterate over each set and run qpadm for all combinations in parallel
for (i in 1:length(sets)) {
    cat("Running Set", i, "\n")
    set_results <- run_qpadm_for_set_parallel(i, sets, optional_pops, prefix, right, target, cl)
    
    # Append the results of the current set to the overall results list
    all_results_list[[i]] <- set_results
}

# Stop the parallel cluster after processing
stopCluster(cl)

# View results
print(all_results_list)
 
I've started including set.seed(12345) in my script, and has massively reduced inconsistency in my results. Though it is not 100% perfect. Is this a common practice in qpadm? Because it seems to be conducive to consistent results, and it replicates results I was able to reproduce before I implemented it.


Code:
# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"
my_f2_dir <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\my_f2_dir_v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)
library(parallel)

# Set a random seed for reproducibility
set.seed(12345)

# Define target population
target = c('Jovialis')

# Define right populations
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG',
           'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG',
           'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Generate f2 stats only once to avoid redundant computations
mypops = c(right, target)
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)

# Define the list of optional populations (only LBA samples)
optional_pops = c('Greece_Aidonia_LBA.AG', 'Greece_Crete_Aposelemis_LBA.AG',
                  'Greece_GlykaNera_LBA.AG', 'Greece_Koukounaries_LBA.AG',
                  'Greece_Lazarides_LBA.AG', 'Greece_Tiryns_LBA.AG',
                  'Greece_Crete_Chania_LBA.AG', 'Greece_PalaceOfNestor_BA.AG',
                  'Greece_Attica_Kolikrepi_BA.AG', 'Greece_Crete_Zakros_BA.AG',
                  'Greece_Kastrouli_BA.AG', 'Greece_Salamis_BA.AG',
                  'Greece_Galatas_BA.AG', 'Greece_Peristeria_BA.AG',
                  'Greece_Crete_Armenoi_BA.AG', 'Greece_Crete_Odigitria_BA.AG',
                  'Greece_Crete_Krousonas_LBA.SG')

# Define the sets
sets = list(
    list(must_use = c('Italy_PianSultano_BA.SG'))
)

# Initialize list to store all results
all_results_list = list()

# Parallelization setup
num_cores <- detectCores() - 1  # Reserve 1 core for system tasks
cl <- makeCluster(num_cores)

# Export necessary objects to the cluster
clusterEvalQ(cl, library(admixtools))
clusterExport(cl, list("prefix", "right", "target", "qpadm", "optional_pops", "sets"))

# Function to create combinations and run qpadm in parallel
run_qpadm_for_set_parallel <- function(set_index, sets, optional_pops, prefix, right, target, cl) {
    set_must_use <- sets[[set_index]]$must_use
   
    results_list <- parLapply(cl, optional_pops, function(pop) {
        left_set = c(set_must_use, pop)
       
        # Run qpadm model and handle errors gracefully
        tryCatch({
            results = qpadm(prefix, left_set, right, target, allsnps = TRUE)
            return(list(left = left_set, weights = results$weights, popdrop = results$popdrop))
        }, error = function(e) {
            return(list(left = left_set, error = e$message))
        })
    })
    return(results_list)
}

# Iterate over each set and run qpadm for all combinations in parallel
for (i in 1:length(sets)) {
    cat("Running Set", i, "\n")
    set_results <- run_qpadm_for_set_parallel(i, sets, optional_pops, prefix, right, target, cl)
   
    # Append the results of the current set to the overall results list
    all_results_list[[i]] <- set_results
}

# Stop the parallel cluster after processing
stopCluster(cl)

# View results
print(all_results_list)
I haven't used qpAdm (or any component of AdmixTools2) as of yet but from your description it most likely by default generates the seed randomly.
I've been using ADMIXTURE which by default is set to a static seed but it's common practice in formal studies to generate the seed randomly by adding a flag (which creates the seed from the time) and do 10 or more runs of the same calculation. Then they take the average and/or median as their statistic.
However I've never noticed any papers mentioning this methodology with qpAdm – only with ADMIXTURE.
 
Thanks! I've been getting help from ChatGPT regarding modifying the code (with caution of course). However, this implementation of set.seed(12345) seems to be improving reproducibility. I also had the AI search for if it is a common practice, but it also seemed to be unable to locate it anywhere in documentation. At any rate, it seems to reproduce consistent results back to back most of the time. With an occasional anomaly, though typically back to the same result.

I've tested it against consistent results that are produced without the set.seed(12345) implemented, and it produces the same consistent result.

I find this to be very helpful, considering without it, reproducing the same result can take a few runs.

I should also mention for the code I last posted on this thread; it also utilizes library(parallel). But set.seed(12345) also works without it.
 
Thanks! I've been getting help from ChatGPT regarding modifying the code (with caution of course). However, this implementation of set.seed(12345) seems to be improving reproducibility. I also had the AI search for if it is a common practice, but it also seemed to be unable to locate it anywhere in documentation. At any rate, it seems to reproduce consistent results back to back most of the time. With an occasional anomaly, though typically back to the same result.

I've tested it against consistent results that are produced without the set.seed(12345) implemented, and it produces the same consistent result.

I find this to be very helpful, considering without it, reproducing the same result can take a few runs.

I should also mention for the code I last posted on this thread; it also utilizes library(parallel). But set.seed(12345) also works without it.
So you actually had to modify the code to set it to a static seed? I would have expected such a function to be already implemented as a user option in the software (e.g. by use of a flag). You also mentioned that it still occasionally gives an alternate result even with your set.seed in the code? Technically it should produce the same result 100% of the time for runs of the same data input if the seed is constant.
 
Thank you for prompting me to get to the bottom of the issue!

I strive for accuracy, and I found out what the problem was. It was the arrangement of my code:

Code:
# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)

# Define populations
target <- c('Jovialis')
left <- c('Greece_Crete_HgCharalambos_EMBA.AG', 'Italy_PianSultano_BA.SG')
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG', 'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG', 'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Run the Model directly using genotype files
results <- qpadm(prefix, left, right, target, allsnps = TRUE)

# Print Results
print(results$weights)
print(results$popdrop)
 
With the new script I executed the model 10 times, 8 times were consistent, but 2 times is showed different results (though consistent with each other).

Would this be considered acceptable? It is a massive improvement to say the least.

This is from a very simplified model, without the set.seed()


Code:
# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)

# Define populations
target <- c('Jovialis')
left <- c('Greece_Crete_HgCharalambos_EMBA.AG', 'Italy_PianSultano_BA.SG')
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG', 'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG', 'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Run the Model directly using genotype files
results <- qpadm(prefix, left, right, target, allsnps = TRUE)

# Print Results
print(results$weights)
print(results$popdrop)
 
With the new script I executed the model 10 times, 8 times were consistent, but 2 times is showed different results (though consistent with each other).

Would this be considered acceptable? It is a massive improvement to say the least.

This is from a very simplified model, without the set.seed()


Code:
# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)

# Define populations
target <- c('Jovialis')
left <- c('Greece_Crete_HgCharalambos_EMBA.AG', 'Italy_PianSultano_BA.SG')
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG', 'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG', 'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Run the Model directly using genotype files
results <- qpadm(prefix, left, right, target, allsnps = TRUE)

# Print Results
print(results$weights)
print(results$popdrop)
Presuming everything is working the way it's supposed to, any result that meets the criteria (p-value threshold etc) is considered a plausible model. However I would be inclined to go with the more frequent result. Was the 2/10 result much different?
 
With the new script I executed the model 10 times, 8 times were consistent, but 2 times is showed different results (though consistent with each other).

Would this be considered acceptable? It is a massive improvement to say the least.

This is from a very simplified model, without the set.seed()


Code:
# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)

# Define populations
target <- c('Jovialis')
left <- c('Greece_Crete_HgCharalambos_EMBA.AG', 'Italy_PianSultano_BA.SG')
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG', 'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG', 'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Run the Model directly using genotype files
results <- qpadm(prefix, left, right, target, allsnps = TRUE)

# Print Results
print(results$weights)
print(results$popdrop)
I found how to change the seed in qpAdm. However this is when running it with a parameter file but it looks like you a running it via R-scripts. The optional parameter you have to add (in the parfile) is,

seed: Specifies the seed to be used. If set to 0, a random seed will be chosen according to the system clock. Default: 0.

[source: https://reich.hms.harvard.edu/sites...les/Harney_2021_Genetics_qpAdm_supplement.pdf]

So that confirms that qpAdm is set to a random seed by default and can be easily changed by the user. As to how you do the equivalent in an R-script I'm not sure.
 
I found how to change the seed in qpAdm. However this is when running it with a parameter file but it looks like you a running it via R-scripts. The optional parameter you have to add (in the parfile) is,

seed: Specifies the seed to be used. If set to 0, a random seed will be chosen according to the system clock. Default: 0.

[source: https://reich.hms.harvard.edu/sites...les/Harney_2021_Genetics_qpAdm_supplement.pdf]

So that confirms that qpAdm is set to a random seed by default and can be easily changed by the user. As to how you do the equivalent in an R-script I'm not sure.
If set to default, it probably means the randomness is done by design. Given that my model can produce 8 out of 10 consistent results, may mean it has a low sensitivity to randomness, while a non-robust model typically generates many inconsistent results.

Thank you for helping me get to the bottom of the issue!
 
If set to default, it probably means the randomness is done by design. Given that my model can produce 8 out of 10 consistent results, may mean it has a low sensitivity to randomness, while a non-robust model typically generates many inconsistent results.

Thank you for helping me get to the bottom of the issue!
no problem. I'm planning on giving qpAdm a whirl myself in the next couple of months or so.
 
Does the difference in format between packped and ped have any significant effect on the admixtools results?
 
I am using Linux and I have installed qpAdm, but I have this problem, why my .geno file is different? I cannot even open i, I changed the properties, I delated and download again here: https://reich.hms.harvard.edu/allen...le-genotypes-present-day-and-ancient-dna-data

And nothing! Do you know why is happening that and how to fixed it?

UM40qcv.jpg
Did you download the associated .ind and .snp files? (Because you need all 3 to make it work in qpAdm). You don't open the .geno file. You reference the files you are using for your analysis in your qpAdm commands.
 
Did you download the associated .ind and .snp files? (Because you need all 3 to make it work in qpAdm). You don't open the .geno file. You reference the files you are using for your analysis in your qpAdm commands.

Yes, I download the Tarball all files.

My ram in the virtual machine is 5187 MB and processors/CPU: 2

I follow these 2 guides:



And when I run in the Terminal, something killed the process and Idk why is not giving me the results:

EgzEuug.jpg


lvruFR0.jpg


What is happening? Because qpAdm seems to be good installed:

nFGLUpO.jpg


And this is my bin folder:

weBkw9m.jpg
 
Last edited:
And when I am using the Terminal Super User Mode:

tAMW4gy.jpg


Idk what is wrong...
 
Back
Top