Admixtools admixtools2 TUTORIAL for WINDOWS.

Jovialis · Sep 22, 2024

Jovialis said:
Just in case anyone wants to know, you convert eigenstrat format files to plink by creating a parfile in plink like so (This is also a note for me so I can remember; re-learning this stuff after a year can take a while):

1. Open Ubantu and navigate to where you have it mount, and where you send your files. Ming is on a D drive called "UbuntuJovialisHome":

Code:

cd /mnt/d/UbuntuJovialisHome/

2. Create the parfile, I use nano:

Code:

nano parfile

3. In the nano file editor type this:

Code:

genotypename: /mnt/d/UbuntuJovialisHome/v62.0_HO_public.geno snpname: /mnt/d/UbuntuJovialisHome/v62.0_HO_public.snp indivname: /mnt/d/UbuntuJovialisHome/v62.0_HO_public.ind outputformat: PACKEDPED genotypeoutname: /mnt/d/UbuntuJovialisHome/data.bed snpoutname: /mnt/d/UbuntuJovialisHome/data.bim indivoutname: /mnt/d/UbuntuJovialisHome/data.fam familynames: NO

4. Finally, execute the parfile to produce the BED, BIM, and FAM:

Code:

convertf -p /mnt/d/UbuntuJovialisHome/parfile

After a while it will finish processing and generate in the destination you are sending it to.

Optional. Here's a quick way to verify in Ubuntu when it is done processing (or you can just navigate there):

Code:

ls -lh /mnt/d/UbuntuJovialisHome/v62.0_HO_public.*

Finally have the process documented from using my HG19-aligned 23andme txt file produced last year from when I processed it from FASTQ

Code:

# Step 1: Convert the 23andMe file to PLINK binary format.
plink --23file /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked_23andMe_V3.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked

# Step 2a: Extract SNPs from the Jovialis dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --write-snplist --out /mnt/d/UbuntuJovialisHome/jovialis_snp_list

# Step 2b: Extract SNPs from the AADR (v62.0_HO_public) dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --write-snplist --allow-no-sex --out /mnt/d/UbuntuJovialisHome/v62_snp_list

# Step 3: Find common SNPs between the two datasets (Jovialis and AADR).
comm -12 <(sort /mnt/d/UbuntuJovialisHome/jovialis_snp_list.snplist) <(sort /mnt/d/UbuntuJovialisHome/v62_snp_list.snplist) > /mnt/d/UbuntuJovialisHome/common_snps.txt

# Step 4: Filter the Jovialis dataset to keep only SNPs present in both datasets (common SNPs).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --extract /mnt/d/UbuntuJovialisHome/common_snps.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_common_snps

# Step 5: Attempt an initial merge and identify problematic SNPs (multiallelic or inconsistent strand).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62_common_snps --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged

# Step 6: Flip problematic SNPs in the Jovialis dataset (fix strand inconsistencies).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --flip /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_flipped

# Step 7: Exclude remaining problematic SNPs from the Jovialis dataset.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_flipped --exclude /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered

# Step 8: Filter the Jovialis dataset to keep only SNPs present in AADR.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_PLINK_binary --extract /mnt/d/UbuntuJovialisHome/v62.0_HO_public.bim --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered_for_AADR

# Step 9: Perform the final merge, ensuring all SNPs from AADR are kept and only SNPs matching AADR from Jovialis are merged.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_filtered_cleaned --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final

# Step 10a: Check SNP frequency in the final merged dataset to verify all SNPs from AADR were retained.
plink --bfile /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final --freq --out snp_check

# Step 10b: Check SNP frequency in the original AADR dataset for comparison.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --freq --out aadr_check

Jovialis · Sep 22, 2024

Happy things are working:

https://twitter.com/x/status/1837873678366716357

Celtion · Oct 4, 2024

eupator said:
This is is how you merge with PLINK:

My raw dna file in this example is eupator.txt in the 23ame format, if it's not in that format, convert it with dna kit studio or similar.

Code:

plink --23file eupator.txt --make-bed --out eupator

That will create the bed filed eupator.bed. Use an editor to rename your sample in the file to your liking.

Then use your converted .bed dataset to merge. In this example it's v54_HO_public.

Code:

plink --allow-no-sex --bfile v54_HO_public --bmerge eupator --out v54_HO_public_eupator

The merged file is 54_HO_public_merged.

If you get strand inconsistency problems you can solve them, as they appear, with the following:

Code:

plink --allow-no-sex --bfile eupator --flip v54_HO_public_eupator-merge.missnp --make-bed --out eupator_flipped plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_flipped --make-bed --out v54_HO_public_eupator2 plink --allow-no-sex --bfile eupator_flipped --exclude v54_HO_public_eupator2-merge.missnp --make-bed --out eupator_filtered plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_filtered --make-bed --out v54_HO_public_final

Your new merged dataset is v54_HO_public_final, you can use it straight away with at2 as at2 reads plink files.

I had a go mimicking the same process using PLINK i.e. merging my individual sample file with the AADR dataset (although I used an earlier version – v44.3_HO_public – which I already had in plink format) and although it merged (after the flipping and excluding etc), the genotyping rate was very compromised from the original non-merged files. I tried a number of variations of the merging procedures and was able to get an improved result using the --write-snplist trick but the quality was still noticeably lower.

If you generate a --missing report on your non-merged v54_HO_public file and then on your v54_HO_public_final file you'll probably see what I mean.

Celtion · Oct 4, 2024

Jovialis said:

Finally have the process documented from using my HG19-aligned 23andme txt file produced last year from when I processed it from FASTQ

Code:

# Step 1: Convert the 23andMe file to PLINK binary format.
plink --23file /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked_23andMe_V3.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked

# Step 2a: Extract SNPs from the Jovialis dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --write-snplist --out /mnt/d/UbuntuJovialisHome/jovialis_snp_list

# Step 2b: Extract SNPs from the AADR (v62.0_HO_public) dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --write-snplist --out /mnt/d/UbuntuJovialisHome/v62_snp_list

# Step 3: Find common SNPs between the two datasets (Jovialis and AADR).
comm -12 <(sort /mnt/d/UbuntuJovialisHome/jovialis_snp_list.snplist) <(sort /mnt/d/UbuntuJovialisHome/v62_snp_list.snplist) > /mnt/d/UbuntuJovialisHome/common_snps.txt

# Step 4a: Filter the Jovialis dataset to keep only SNPs present in both datasets (common SNPs).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --extract /mnt/d/UbuntuJovialisHome/common_snps.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_common_snps

# Step 4b: Filter the AADR dataset to keep only SNPs present in both datasets (common SNPs).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --extract /mnt/d/UbuntuJovialisHome/common_snps.txt --make-bed --out /mnt/d/UbuntuJovialisHome/v62_common_snps

# Step 5: Attempt an initial merge and identify problematic SNPs (multiallelic or inconsistent strand).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62_common_snps --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged

# Step 6: Flip problematic SNPs in the Jovialis dataset (fix strand inconsistencies).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --flip /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_flipped

# Step 7: Exclude remaining problematic SNPs from the Jovialis dataset.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_flipped --exclude /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered

# Step 8: Filter the Jovialis dataset to keep only SNPs present in AADR.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_PLINK_binary --extract /mnt/d/UbuntuJovialisHome/v62.0_HO_public.bim --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered_for_AADR

# Step 9: Perform the final merge, ensuring all SNPs from AADR are kept and only SNPs matching AADR from Jovialis are merged.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_filtered_cleaned --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final

# Step 10a: Check SNP frequency in the final merged dataset to verify all SNPs from AADR were retained.
plink --bfile /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final --freq --out snp_check

# Step 10b: Check SNP frequency in the original AADR dataset for comparison.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --freq --out aadr_check

Any reason you aren't using Eigensoft MERGEIT to merge your individual file with the AADR data set (in Eigenstrat format)? It's much simpler and by default it does all the strand flipping and exclusions where necessary so you don't need to go through as many steps.
Once you've created the parfile listing the files to merge it's just ./mergeit - p <parfile> and it's done.

Jovialis · Oct 4, 2024

Celtion said:
Any reason you aren't using Eigensoft MERGEIT to merge your individual file with the AADR data set (in Eigenstrat format)? It's much simpler and by default it does all the strand flipping and exclusions where necessary so you don't need to go through as many steps.
Once you've created the parfile listing the files to merge it's just ./mergeit - p <parfile> and it's done.

I actually did try it initially last year before I did it in Plink, but there was something that I am currently forgetting about the process that wasn't quite right.

Jovialis · Oct 17, 2024

I've started including set.seed(12345) in my script, and has massively reduced inconsistency in my results. Though it is not 100% perfect. Is this a common practice in qpadm? Because it seems to be conducive to consistent results, and it replicates results I was able to reproduce before I implemented it.

Code:

# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"
my_f2_dir <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\my_f2_dir_v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)
library(parallel)

# Set a random seed for reproducibility
set.seed(12345)

# Define target population
target = c('Jovialis')

# Define right populations
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG',
           'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG',
           'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Generate f2 stats only once to avoid redundant computations
mypops = c(right, target)
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)

# Define the list of optional populations (only LBA samples)
optional_pops = c('Greece_Aidonia_LBA.AG', 'Greece_Crete_Aposelemis_LBA.AG',
                  'Greece_GlykaNera_LBA.AG', 'Greece_Koukounaries_LBA.AG',
                  'Greece_Lazarides_LBA.AG', 'Greece_Tiryns_LBA.AG',
                  'Greece_Crete_Chania_LBA.AG', 'Greece_PalaceOfNestor_BA.AG',
                  'Greece_Attica_Kolikrepi_BA.AG', 'Greece_Crete_Zakros_BA.AG',
                  'Greece_Kastrouli_BA.AG', 'Greece_Salamis_BA.AG',
                  'Greece_Galatas_BA.AG', 'Greece_Peristeria_BA.AG',
                  'Greece_Crete_Armenoi_BA.AG', 'Greece_Crete_Odigitria_BA.AG',
                  'Greece_Crete_Krousonas_LBA.SG')

# Define the sets
sets = list(
    list(must_use = c('Italy_PianSultano_BA.SG'))
)

# Initialize list to store all results
all_results_list = list()

# Parallelization setup
num_cores <- detectCores() - 1  # Reserve 1 core for system tasks
cl <- makeCluster(num_cores)

# Export necessary objects to the cluster
clusterEvalQ(cl, library(admixtools))
clusterExport(cl, list("prefix", "right", "target", "qpadm", "optional_pops", "sets"))

# Function to create combinations and run qpadm in parallel
run_qpadm_for_set_parallel <- function(set_index, sets, optional_pops, prefix, right, target, cl) {
    set_must_use <- sets[[set_index]]$must_use
    
    results_list <- parLapply(cl, optional_pops, function(pop) {
        left_set = c(set_must_use, pop)
        
        # Run qpadm model and handle errors gracefully
        tryCatch({
            results = qpadm(prefix, left_set, right, target, allsnps = TRUE)
            return(list(left = left_set, weights = results$weights, popdrop = results$popdrop))
        }, error = function(e) {
            return(list(left = left_set, error = e$message))
        })
    })
    return(results_list)
}

# Iterate over each set and run qpadm for all combinations in parallel
for (i in 1:length(sets)) {
    cat("Running Set", i, "\n")
    set_results <- run_qpadm_for_set_parallel(i, sets, optional_pops, prefix, right, target, cl)
    
    # Append the results of the current set to the overall results list
    all_results_list[[i]] <- set_results
}

# Stop the parallel cluster after processing
stopCluster(cl)

# View results
print(all_results_list)

Celtion · Oct 17, 2024

Jovialis said:

I've started including set.seed(12345) in my script, and has massively reduced inconsistency in my results. Though it is not 100% perfect. Is this a common practice in qpadm? Because it seems to be conducive to consistent results, and it replicates results I was able to reproduce before I implemented it.

Code:

# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"
my_f2_dir <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\my_f2_dir_v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)
library(parallel)

# Set a random seed for reproducibility
set.seed(12345)

# Define target population
target = c('Jovialis')

# Define right populations
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG',
           'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG',
           'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Generate f2 stats only once to avoid redundant computations
mypops = c(right, target)
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)

# Define the list of optional populations (only LBA samples)
optional_pops = c('Greece_Aidonia_LBA.AG', 'Greece_Crete_Aposelemis_LBA.AG',
                  'Greece_GlykaNera_LBA.AG', 'Greece_Koukounaries_LBA.AG',
                  'Greece_Lazarides_LBA.AG', 'Greece_Tiryns_LBA.AG',
                  'Greece_Crete_Chania_LBA.AG', 'Greece_PalaceOfNestor_BA.AG',
                  'Greece_Attica_Kolikrepi_BA.AG', 'Greece_Crete_Zakros_BA.AG',
                  'Greece_Kastrouli_BA.AG', 'Greece_Salamis_BA.AG',
                  'Greece_Galatas_BA.AG', 'Greece_Peristeria_BA.AG',
                  'Greece_Crete_Armenoi_BA.AG', 'Greece_Crete_Odigitria_BA.AG',
                  'Greece_Crete_Krousonas_LBA.SG')

# Define the sets
sets = list(
    list(must_use = c('Italy_PianSultano_BA.SG'))
)

# Initialize list to store all results
all_results_list = list()

# Parallelization setup
num_cores <- detectCores() - 1  # Reserve 1 core for system tasks
cl <- makeCluster(num_cores)

# Export necessary objects to the cluster
clusterEvalQ(cl, library(admixtools))
clusterExport(cl, list("prefix", "right", "target", "qpadm", "optional_pops", "sets"))

# Function to create combinations and run qpadm in parallel
run_qpadm_for_set_parallel <- function(set_index, sets, optional_pops, prefix, right, target, cl) {
    set_must_use <- sets[[set_index]]$must_use
   
    results_list <- parLapply(cl, optional_pops, function(pop) {
        left_set = c(set_must_use, pop)
       
        # Run qpadm model and handle errors gracefully
        tryCatch({
            results = qpadm(prefix, left_set, right, target, allsnps = TRUE)
            return(list(left = left_set, weights = results$weights, popdrop = results$popdrop))
        }, error = function(e) {
            return(list(left = left_set, error = e$message))
        })
    })
    return(results_list)
}

# Iterate over each set and run qpadm for all combinations in parallel
for (i in 1:length(sets)) {
    cat("Running Set", i, "\n")
    set_results <- run_qpadm_for_set_parallel(i, sets, optional_pops, prefix, right, target, cl)
   
    # Append the results of the current set to the overall results list
    all_results_list[[i]] <- set_results
}

# Stop the parallel cluster after processing
stopCluster(cl)

# View results
print(all_results_list)

I haven't used qpAdm (or any component of AdmixTools2) as of yet but from your description it most likely by default generates the seed randomly.
I've been using ADMIXTURE which by default is set to a static seed but it's common practice in formal studies to generate the seed randomly by adding a flag (which creates the seed from the time) and do 10 or more runs of the same calculation. Then they take the average and/or median as their statistic.
However I've never noticed any papers mentioning this methodology with qpAdm – only with ADMIXTURE.

Jovialis · Oct 17, 2024

Thanks! I've been getting help from ChatGPT regarding modifying the code (with caution of course). However, this implementation of set.seed(12345) seems to be improving reproducibility. I also had the AI search for if it is a common practice, but it also seemed to be unable to locate it anywhere in documentation. At any rate, it seems to reproduce consistent results back to back most of the time. With an occasional anomaly, though typically back to the same result.

I've tested it against consistent results that are produced without the set.seed(12345) implemented, and it produces the same consistent result.

I find this to be very helpful, considering without it, reproducing the same result can take a few runs.

I should also mention for the code I last posted on this thread; it also utilizes library(parallel). But set.seed(12345) also works without it.

Celtion · Oct 18, 2024

Jovialis said:
Thanks! I've been getting help from ChatGPT regarding modifying the code (with caution of course). However, this implementation of set.seed(12345) seems to be improving reproducibility. I also had the AI search for if it is a common practice, but it also seemed to be unable to locate it anywhere in documentation. At any rate, it seems to reproduce consistent results back to back most of the time. With an occasional anomaly, though typically back to the same result.

I've tested it against consistent results that are produced without the set.seed(12345) implemented, and it produces the same consistent result.

I find this to be very helpful, considering without it, reproducing the same result can take a few runs.

I should also mention for the code I last posted on this thread; it also utilizes library(parallel). But set.seed(12345) also works without it.

So you actually had to modify the code to set it to a static seed? I would have expected such a function to be already implemented as a user option in the software (e.g. by use of a flag). You also mentioned that it still occasionally gives an alternate result even with your set.seed in the code? Technically it should produce the same result 100% of the time for runs of the same data input if the seed is constant.

Jovialis · Oct 18, 2024

Thank you for prompting me to get to the bottom of the issue!

I strive for accuracy, and I found out what the problem was. It was the arrangement of my code:

Code:

# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)

# Define populations
target <- c('Jovialis')
left <- c('Greece_Crete_HgCharalambos_EMBA.AG', 'Italy_PianSultano_BA.SG')
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG', 'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG', 'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Run the Model directly using genotype files
results <- qpadm(prefix, left, right, target, allsnps = TRUE)

# Print Results
print(results$weights)
print(results$popdrop)

Jovialis · Oct 18, 2024

With the new script I executed the model 10 times, 8 times were consistent, but 2 times is showed different results (though consistent with each other).

Would this be considered acceptable? It is a massive improvement to say the least.

This is from a very simplified model, without the set.seed()

Code:

# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)

# Define populations
target <- c('Jovialis')
left <- c('Greece_Crete_HgCharalambos_EMBA.AG', 'Italy_PianSultano_BA.SG')
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG', 'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG', 'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Run the Model directly using genotype files
results <- qpadm(prefix, left, right, target, allsnps = TRUE)

# Print Results
print(results$weights)
print(results$popdrop)

Celtion · Oct 20, 2024

Jovialis said:

With the new script I executed the model 10 times, 8 times were consistent, but 2 times is showed different results (though consistent with each other).

Would this be considered acceptable? It is a massive improvement to say the least.

This is from a very simplified model, without the set.seed()

Code:

# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)

# Define populations
target <- c('Jovialis')
left <- c('Greece_Crete_HgCharalambos_EMBA.AG', 'Italy_PianSultano_BA.SG')
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG', 'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG', 'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Run the Model directly using genotype files
results <- qpadm(prefix, left, right, target, allsnps = TRUE)

# Print Results
print(results$weights)
print(results$popdrop)

Presuming everything is working the way it's supposed to, any result that meets the criteria (p-value threshold etc) is considered a plausible model. However I would be inclined to go with the more frequent result. Was the 2/10 result much different?

Celtion · Oct 20, 2024

Jovialis said:

With the new script I executed the model 10 times, 8 times were consistent, but 2 times is showed different results (though consistent with each other).

Would this be considered acceptable? It is a massive improvement to say the least.

This is from a very simplified model, without the set.seed()

Code:

# ---- Skourtanioti et al. 2023 qpAdm Reference model (Plink format AADR 62.0) ----

# Set the correct dataset path
prefix <- "D:\\Bioinformatics\\01_Admixtools_Dataset\\V62.0_HO_Plink_Merged_Jovialis\\v62.0_Jovialis_Merged"

# Load necessary libraries
library(admixtools)
library(tidyverse)

# Define populations
target <- c('Jovialis')
left <- c('Greece_Crete_HgCharalambos_EMBA.AG', 'Italy_PianSultano_BA.SG')
right <- c('Ethiopia_4500BP.AG', 'Russia_UstIshim_IUP.DG', 'Russia_Kostenki14_UP.SG', 'Serbia_IronGates_Mesolithic.AG', 'WHG', 'CHG', 'EEHG', 'W_Anatolia_N', 'Iran_GanjDareh_N.AG', 'Israel_Natufian.AG', 'ANE')

# Run the Model directly using genotype files
results <- qpadm(prefix, left, right, target, allsnps = TRUE)

# Print Results
print(results$weights)
print(results$popdrop)

I found how to change the seed in qpAdm. However this is when running it with a parameter file but it looks like you a running it via R-scripts. The optional parameter you have to add (in the parfile) is,

seed: Specifies the seed to be used. If set to 0, a random seed will be chosen according to the system clock. Default: 0.

[source: https://reich.hms.harvard.edu/sites...les/Harney_2021_Genetics_qpAdm_supplement.pdf]

So that confirms that qpAdm is set to a random seed by default and can be easily changed by the user. As to how you do the equivalent in an R-script I'm not sure.

Jovialis · Oct 21, 2024

Celtion said:
I found how to change the seed in qpAdm. However this is when running it with a parameter file but it looks like you a running it via R-scripts. The optional parameter you have to add (in the parfile) is,

seed: Specifies the seed to be used. If set to 0, a random seed will be chosen according to the system clock. Default: 0.

[source: https://reich.hms.harvard.edu/sites...les/Harney_2021_Genetics_qpAdm_supplement.pdf]

So that confirms that qpAdm is set to a random seed by default and can be easily changed by the user. As to how you do the equivalent in an R-script I'm not sure.

If set to default, it probably means the randomness is done by design. Given that my model can produce 8 out of 10 consistent results, may mean it has a low sensitivity to randomness, while a non-robust model typically generates many inconsistent results.

Thank you for helping me get to the bottom of the issue!

Celtion · Oct 21, 2024

Jovialis said:
If set to default, it probably means the randomness is done by design. Given that my model can produce 8 out of 10 consistent results, may mean it has a low sensitivity to randomness, while a non-robust model typically generates many inconsistent results.

Thank you for helping me get to the bottom of the issue!

no problem. I'm planning on giving qpAdm a whirl myself in the next couple of months or so.

Jalisciense · Oct 26, 2024

I am using Linux and I have installed qpAdm, but I have this problem, why my .geno file is different? I cannot even open i, I changed the properties, I delated and download again here: https://reich.hms.harvard.edu/allen...le-genotypes-present-day-and-ancient-dna-data

And nothing! Do you know why is happening that and how to fixed it?

Poxy · Oct 26, 2024

Does the difference in format between packped and ped have any significant effect on the admixtools results?

Celtion · Oct 26, 2024

Jalisciense said:
I am using Linux and I have installed qpAdm, but I have this problem, why my .geno file is different? I cannot even open i, I changed the properties, I delated and download again here: https://reich.hms.harvard.edu/allen...le-genotypes-present-day-and-ancient-dna-data

And nothing! Do you know why is happening that and how to fixed it?

Did you download the associated .ind and .snp files? (Because you need all 3 to make it work in qpAdm). You don't open the .geno file. You reference the files you are using for your analysis in your qpAdm commands.

Jalisciense · Oct 26, 2024

Celtion said:
Did you download the associated .ind and .snp files? (Because you need all 3 to make it work in qpAdm). You don't open the .geno file. You reference the files you are using for your analysis in your qpAdm commands.

Yes, I download the Tarball all files.

My ram in the virtual machine is 5187 MB and processors/CPU: 2

I follow these 2 guides:

Stepwise tutorial to Install Admixtools & Plink

How to install admixtools, qpadm, qpgraph and Plink

a-genetics.blogspot.com

https://www.reddit.com/r/IndoEuropean/comments/ta47ei/qpadm_and_other_admixtools_tutorial

And when I run in the Terminal, something killed the process and Idk why is not giving me the results:

What is happening? Because qpAdm seems to be good installed:

And this is my bin folder:

Jalisciense · Oct 26, 2024

And when I am using the Terminal Super User Mode:

Idk what is wrong...

Admixtools admixtools2 TUTORIAL for WINDOWS.

Advisor

Advisor

Regular Member

Regular Member

Advisor

Advisor

Regular Member

Advisor

Regular Member

Advisor

Advisor

Regular Member

Regular Member

Advisor

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member

Regular Member