• Don't want to see ads? Install an adblocker like uBlock Origin or use a Europe-based privacy-friendly browser like Vivaldi or Mullvad.

Admixtools admixtools2 TUTORIAL for WINDOWS.

Seems like it is best to produce PCAs in smartpca with 1240K, because of higher resolution, and is consistent with what aDNA studies use.
kinda unrelated but what do u think about v54.1 1240k vs v62 1240k? v62 is ofc more up to date but I see weird samples and labels and the v54 seems more curated or "refined"? which one do you use? I get slight differents results
 
kinda unrelated but what do u think about v54.1 1240k vs v62 1240k? v62 is ofc more up to date but I see weird samples and labels and the v54 seems more curated or "refined"? which one do you use? I get slight differents results
Not sure to be honest.
 
Here's a refined version of the PCA along with the R script used:

lBjnx5C.png


Code:
# Load necessary libraries
library(ggplot2)
library(dplyr)

# Set working directory
setwd("D:/Bioinformatics/01_Admixtools_Dataset/V62.0_HO_Eigenstrat_Merged_Jovialis/W_Eurasia_Mod_aDNA")

# Read the eigenvalues
evals <- scan("projected.eval.txt", quiet = TRUE)

# Read the eigenvectors
evecs <- read.table("projected.evec.txt", header = FALSE, stringsAsFactors = FALSE)

# Extract individual IDs and population labels
individuals <- as.character(evecs$V1)
populations <- as.character(evecs$V12)  # Adjust if your population labels are in a different column

# Extract the first two principal components and flip both axes
pc1 <- -as.numeric(evecs$V2)  # Horizontal flip
pc2 <- -as.numeric(evecs$V3)  # Vertical flip

# Create a data frame for plotting
pca_data <- data.frame(Individual = individuals, Population = populations, PC1 = pc1, PC2 = pc2, stringsAsFactors = FALSE)

# Remove rows with NA values (if any)
pca_data <- na.omit(pca_data)

# Define populations to highlight
highlighted_pops <- c(
    "Jovialis", "Armenian.HO", "Iranian.HO", "Turkish.HO", "Albanian.HO", "Italian_North.HO",
    "Bulgarian.HO", "Cypriot.HO", "Greek.HO", "Italian_South.HO", "Maltese.HO", "Sicilian.HO",
    "Italian_Central.HO", "English.HO", "French.HO", "Icelandic.HO", "Norwegian.HO", "Orcadian.HO",
    "Scottish.HO", "BedouinA.HO", "BedouinB.HO", "Jordanian.HO", "Palestinian.HO", "Saudi.HO",
    "Syrian.HO", "Abkhasian.HO", "Adygei.HO", "Balkar.HO", "Chechen.HO", "Georgian.HO", "Kumyk.HO",
    "Lezgin.HO", "Russia_NorthOssetian.HO", "Jew_Ashkenazi.HO", "Jew_Georgian.HO", "Jew_Iranian.HO",
    "Jew_Iraqi.HO", "Jew_Libyan.HO", "Jew_Moroccan.HO", "Jew_Tunisian.HO", "Jew_Turkish.HO",
    "Jew_Yemenite.HO", "Basque.HO", "Spanish.HO", "Spanish_North.HO", "Druze.HO", "Lebanese.HO",
    "Belarusian.HO", "Croatian.HO", "Czech.HO", "Estonian.HO", "Hungarian.HO", "Lithuanian.HO",
    "Ukrainian.HO", "IBS_CanaryIslands.DG", "Sardinian.HO", "Finnish.HO", "Mordovian.HO", "Russian.HO"
)

# Filter data to include only highlighted populations
pca_data <- pca_data %>% filter(Population %in% highlighted_pops)

# Assign groups for coloring and filling
pca_data <- pca_data %>%
    mutate(
        Group = case_when(
            Population == "Jovialis" ~ "Jovialis",
            Population == "Armenian.HO" ~ "Armenian",
            Population == "Iranian.HO" ~ "Iranian",
            Population == "Turkish.HO" ~ "Turkish",
            Population == "Albanian.HO" ~ "Albanian",
            Population == "Italian_North.HO" ~ "Italian_North",
            Population == "Bulgarian.HO" ~ "Bulgarian",
            Population == "Cypriot.HO" ~ "Cypriot",
            Population == "Greek.HO" ~ "Greek",
            Population == "Italian_South.HO" ~ "Italian_South",
            Population == "Maltese.HO" ~ "Maltese",
            Population == "Sicilian.HO" ~ "Sicilian",
            Population == "Italian_Central.HO" ~ "Italian_Central",
            Population == "English.HO" ~ "English",
            Population == "French.HO" ~ "French",
            Population == "Icelandic.HO" ~ "Icelandic",
            Population == "Norwegian.HO" ~ "Norwegian",
            Population == "Orcadian.HO" ~ "Orcadian",
            Population == "Scottish.HO" ~ "Scottish",
            Population == "BedouinA.HO" ~ "BedouinA",
            Population == "BedouinB.HO" ~ "BedouinB",
            Population == "Jordanian.HO" ~ "Jordanian",
            Population == "Palestinian.HO" ~ "Palestinian",
            Population == "Saudi.HO" ~ "Saudi",
            Population == "Syrian.HO" ~ "Syrian",
            Population == "Abkhasian.HO" ~ "Abkhasian",
            Population == "Adygei.HO" ~ "Adygei",
            Population == "Balkar.HO" ~ "Balkar",
            Population == "Chechen.HO" ~ "Chechen",
            Population == "Georgian.HO" ~ "Georgian",
            Population == "Kumyk.HO" ~ "Kumyk",
            Population == "Lezgin.HO" ~ "Lezgin",
            Population == "Russia_NorthOssetian.HO" ~ "North_Ossetian",
            Population == "Jew_Ashkenazi.HO" ~ "Jew_Ashkenazi",
            Population == "Jew_Georgian.HO" ~ "Jew_Georgian",
            Population == "Jew_Iranian.HO" ~ "Jew_Iranian",
            Population == "Jew_Iraqi.HO" ~ "Jew_Iraqi",
            Population == "Jew_Libyan.HO" ~ "Jew_Libyan",
            Population == "Jew_Moroccan.HO" ~ "Jew_Moroccan",
            Population == "Jew_Tunisian.HO" ~ "Jew_Tunisian",
            Population == "Jew_Turkish.HO" ~ "Jew_Turkish",
            Population == "Jew_Yemenite.HO" ~ "Jew_Yemenite",
            Population == "Basque.HO" ~ "Basque",
            Population == "Spanish.HO" ~ "Spanish",
            Population == "Spanish_North.HO" ~ "Spanish_North",
            Population == "Druze.HO" ~ "Druze",
            Population == "Lebanese.HO" ~ "Lebanese",
            Population == "Belarusian.HO" ~ "Belarusian",
            Population == "Croatian.HO" ~ "Croatian",
            Population == "Czech.HO" ~ "Czech",
            Population == "Estonian.HO" ~ "Estonian",
            Population == "Hungarian.HO" ~ "Hungarian",
            Population == "Lithuanian.HO" ~ "Lithuanian",
            Population == "Ukrainian.HO" ~ "Ukrainian",
            Population == "IBS_CanaryIslands.DG" ~ "Canary_Islands",
            Population == "Sardinian.HO" ~ "Sardinian",
            Population == "Finnish.HO" ~ "Finnish",
            Population == "Mordovian.HO" ~ "Mordovian",
            Population == "Russian.HO" ~ "Russian",
            TRUE ~ "Other"
        )
    )

# Assign colors with a focus on darker shades and valid color names
custom_colors <- c(
    "Jovialis" = "darkgoldenrod", "Armenian" = "darkblue", "Iranian" = "darkgreen",
    "Turkish" = "orange", "Albanian" = "green", "Italian_North" = "darkorange",
    "Bulgarian" = "steelblue", "Cypriot" = "darkmagenta", "Greek" = "saddlebrown",
    "Italian_South" = "darkorchid3", "Maltese" = "blue", "Sicilian" = "darkolivegreen",
    "Italian_Central" = "midnightblue", "English" = "firebrick", "French" = "chocolate4",
    "Icelandic" = "darkslategray", "Norwegian" = "mediumblue", "Orcadian" = "darkslateblue",
    "Scottish" = "darkseagreen", "BedouinA" = "darkcyan", "BedouinB" = "deepskyblue4",
    "Jordanian" = "darkred", "Palestinian" = "darkgreen", "Saudi" = "darkgoldenrod4",
    "Syrian" = "mediumvioletred", "Abkhasian" = "brown4", "Adygei" = "khaki4",
    "Balkar" = "purple4", "Chechen" = "royalblue4", "Georgian" = "brown3",
    "Kumyk" = "forestgreen", "Lezgin" = "springgreen4", "North_Ossetian" = "lightpink4",
    "Jew_Ashkenazi" = "chocolate", "Jew_Georgian" = "darkturquoise",
    "Jew_Iranian" = "dodgerblue4", "Jew_Iraqi" = "slateblue", "Jew_Libyan" = "cornflowerblue",
    "Jew_Moroccan" = "limegreen", "Jew_Tunisian" = "darkred", "Jew_Turkish" = "seagreen4",
    "Jew_Yemenite" = "navyblue", "Basque" = "darkorchid4", "Spanish" = "darkorchid",
    "Spanish_North" = "mediumseagreen", "Druze" = "slateblue4", "Lebanese" = "springgreen3",
    "Belarusian" = "darkturquoise", "Croatian" = "blue", "Czech" = "darkslateblue",
    "Estonian" = "darkslategray4", "Hungarian" = "darkorange3", "Lithuanian" = "tan4",
    "Ukrainian" = "tan", "Canary_Islands" = "navy", "Sardinian" = "darkseagreen4",
    "Finnish" = "olivedrab", "Mordovian" = "darkorange1", "Russian" = "red"
)

# Assign unique filled shapes to each group (cycling through the available filled shapes)
filled_shapes <- c(21, 22, 23, 24, 25)  # Circle, square, diamond, up-triangle, down-triangle
shape_values <- rep(filled_shapes, length.out = length(unique(pca_data$Group)))

# Plot the PCA with dark colors, different filled shapes to distinguish samples, and a black border around the PCA
ggplot(pca_data, aes(x = PC1, y = PC2, color = Group, fill = Group, shape = Group)) +
    geom_point(size = 3) +
    scale_color_manual(values = custom_colors) +
    scale_fill_manual(values = custom_colors) +
    scale_shape_manual(values = shape_values) +
    labs(
        title = "PCA Projection of Modern West Eurasia (AADR_HO v62.0 merged with Jovialis WGS 30x)",
        x = paste0("PC1 (", round(evals[1] / sum(evals) * 100, 2), "% variance)"),
        y = paste0("PC2 (", round(evals[2] / sum(evals) * 100, 2), "% variance)")
    ) +
    theme_minimal() +
    theme(
        legend.position = "bottom",
        legend.title = element_blank(),
        legend.text = element_text(size = 8),  # Decrease legend text size
        legend.key.size = unit(0.4, "cm"),  # Decrease size of legend keys
        legend.spacing.x = unit(0.2, "cm"),  # Decrease horizontal spacing in legend
        legend.box = "horizontal",  # Arrange legend items horizontally
        legend.direction = "horizontal",
        plot.title = element_text(hjust = 0.5),
        panel.border = element_rect(color = "black", fill = NA, linewidth = 1)  # Add black border around the PCA plot
    ) +
    guides(
        color = guide_legend(ncol = 8),
        shape = guide_legend(ncol = 8),  # Make sure the shape legend is also compact
        fill = guide_legend(ncol = 8)  # Make sure the fill legend is also compact
    )
 
Last edited:
kinda unrelated but what do u think about v54.1 1240k vs v62 1240k? v62 is ofc more up to date but I see weird samples and labels and the v54 seems more curated or "refined"? which one do you use? I get slight differents results
The readme file that came with the v62.0 set implies that this is the most refined version:

• "Twist" sequencing is a significant update in our protocol for capture and data for all newer samples are captured this way, as described in [RohlandMallickGenomeResearch2022]. These samples are indicated in the genetic ids with "TW", in contrast to our older Agilent sequencing, marked "AG". One significant advantage is that twist capture reduces bias when co-analysed with shotgun data,
• the pseudo-haploid calling procedure has been updated to objectively determine thresholding parameters based on error rates (technical note to follow),
• 13442 poor performing SNPs were dropped from the Human Origins array, compared with previous releases.
 
I need your help boys, I was trying to merging myself to modelling in qpAdm, I converted my AncestryDNA to 23andMe format, then I used this command:

plink --23file AncestryCombined.txt --make-bed --out mydata

And I got files .bed, .bim, .fam, .hh and .log.

Now I used this command:

plink --allow-no-sex --bfile v62.0_1240k_public --bmerge mydata --out v62.0_1240k_public_mydata

And I got this:

PLINK v1.9.0-b.7.7 64-bit (22 Oct 2024) cog-genomics.org/plink/1.9/
(C) 2005-2024 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to v62.0_1240k_public_mydata.log.
Options in effect:
--allow-no-sex
--bfile v62.0_1240k_public
--bmerge mydata
--out v62.0_1240k_public_mydata

5886 MB RAM detected; reserving 2943 MB for main workspace.
Error: Failed to open v62.0_1240k_public.fam.
Jalisciense@vbox:~/Downloads>

I received a file named v62.0_1240k_public_mydata.log, but why did it say that "Failed to open v62.0_1240k_public.fam"? What is wrong?

When I open the v62.0_1240k_public_mydata.log file is like this:

ANZ81Ql.jpg
 
Last edited:
I tried to change it to the bin folder, but still no luck.

All my .bed, .bim, .fam, .hh and .log files are in the bin folder:

cYPDOIV.jpg


This is my bin folder:

b0YsMNp.jpg
 
I need your help boys, I was trying to merging myself to modelling in qpAdm, I converted my AncestryDNA to 23andMe format, then I used this command:

plink --23file AncestryCombined.txt --make-bed --out mydata

And I got files .bed, .bim, .fam, .hh and .log.

Now I used this command:

plink --allow-no-sex --bfile v62.0_1240k_public --bmerge mydata --out v62.0_1240k_public_mydata

And I got this:

PLINK v1.9.0-b.7.7 64-bit (22 Oct 2024) cog-genomics.org/plink/1.9/
(C) 2005-2024 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to v62.0_1240k_public_mydata.log.
Options in effect:
--allow-no-sex
--bfile v62.0_1240k_public
--bmerge mydata
--out v62.0_1240k_public_mydata

5886 MB RAM detected; reserving 2943 MB for main workspace.
Error: Failed to open v62.0_1240k_public.fam.
Jalisciense@vbox:~/Downloads>

I received a file named v62.0_1240k_public_mydata.log, but why did it say that "Failed to open v62.0_1240k_public.fam"? What is wrong?

When I open the v62.0_1240k_public_mydata.log file is like this:

ANZ81Ql.jpg
When merging you need the --make-bed flag before the --out flag to create the merged file.
 
When merging you need the --make-bed flag before the --out flag to create the merged file.
I think I don't understand, is this right?:

plink --23file AncestryCombined.txt --make-bed --out mydata
 
Yeah but put it in your merging command as well so it should be like:
plink --allow-no-sex --bfile v62.0_1240k_public --bmerge mydata --make-bed --out v62.0_1240k_public_mydata
 
Yeah but put it in your merging command as well so it should be like:
plink --allow-no-sex --bfile v62.0_1240k_public --bmerge mydata --make-bed --out v62.0_1240k_public_mydata

Now I am getting this error:

Error: Failed to open v62.0_1240k_public.bed.

DuV8Asg.jpg


It is the fault of this or what? (Type: Unknown)

t2PwWtG.jpg
 
Last edited:
Now I am getting this error:

Error: Failed to open v62.0_1240k_public.bed.

DuV8Asg.jpg


It is the fault of this or what? (Type: Unknown)

t2PwWtG.jpg
Okay I see the problem now. In your working directory it looks like you only have v62.0_1240k_public in PACKEDANCESTRYMAP format (similar to Eigenstrat). You need to convert it to PACKEDPED (Plink format) using ConvertF so you have the .bed, .bim and .fam files.
 
Okay I see the problem now. In your working directory it looks like you only have v62.0_1240k_public in PACKEDANCESTRYMAP format (similar to Eigenstrat). You need to convert it to PACKEDPED (Plink format) using ConvertF so you have the .bed, .bim and .fam files.
CJYO2vA.jpg


Then I used this command:

./plink -p par.EIGENSTRAT.PED

But now I am getting this error:

jalisciense@vbox:~/bin> ./plink -p par.EIGENSTRAT.PED
PLINK v1.9.0-b.7.7 64-bit (22 Oct 2024) cog-genomics.org/plink/1.9/
© 2005-2024 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink.log.
Options in effect:
--p par.EIGENSTRAT.PED

Error: Unrecognized flag ('-p').
For more information, try "plink --help <flag name>" or "plink --help | more".
jalisciense@vbox:~/bin>

What's going on?
 
CJYO2vA.jpg


Then I used this command:

./plink -p par.EIGENSTRAT.PED

But now I am getting this error:

jalisciense@vbox:~/bin> ./plink -p par.EIGENSTRAT.PED
PLINK v1.9.0-b.7.7 64-bit (22 Oct 2024) cog-genomics.org/plink/1.9/
© 2005-2024 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink.log.
Options in effect:
--p par.EIGENSTRAT.PED

Error: Unrecognized flag ('-p').
For more information, try "plink --help <flag name>" or "plink --help | more".
jalisciense@vbox:~/bin>

What's going on?
No, the convertf command is a part of the Eigensoft suite so you need to have that installed.
 
No, the convertf command is a part of the Eigensoft suite so you need to have that installed.
Do you mean this?


Because the first time I downloaded this:

 
Last edited:
@Jovialis did you delete a post regarding numoutevec setting having to be 15? what does it exactly do?
Yeah, I tested it out, it is not optimal. Frankly, for projecting aDNA, I believe 1240K (more SNPs) numoutevec: 10 is optimal. I haven't had a chance to do it yet. But that is what is done by most studies. The aDNA samples were not plotting correctly in HO, they were too "Western" on the PCA relative to modern populations. Even the Imperial Roman and Anatolian ChL and BA samples, which are certainly not western. I thought changing the numoutevec would fix this, but it only caused modern samples to plot less optimally, and didn't do much to fix the aDNA samples.

The only downside is that 1240K has a less than comprehensive modern pop set. But I think academic studies include supplemental modern pops from other studies, that require approval from their sources, so not truly "public". Nevertheless, I do think some studies merge HO with 1240K, but I haven't had time to verify that, nor figure out how it is done.
 
Last edited:
Yeah, I tested it out, it is not optimal. Frankly, for projecting aDNA, I believe 1240K (more SNPs) numoutevec: 10 is optimal. I haven't had a chance to do it yet. But that is what is done by most studies. The aDNA samples were not plotting correctly in HO, they were too "Western" on the PCA relative to modern populations. Even the Imperial Roman and Anatolian ChL and BA samples, which are certainly not western. I thought changing the numoutevec would fix this, but it only caused modern samples to plot less optimally, and didn't do much to fix the aDNA samples.
uh, now I understand something...
The only downside is that 1240K has a less than comprehensive modern pop set. But I think academic studies include supplemental modern pops from other studies, that require approval from their sources, so not truly "public".
yeah thats what they do, here you have a lot of modern (and ancient) samples from other studies
check supplementary data from studies, for example

1732294181053.png

(from here:https://www.cell.com/cms/10.1016/j....b751f689-49c6-47bb-beb0-ab140c34922e/mmc2.pdf)
theres reference panels from where you can pick samples too, like 1000 genomes Phase 3 shown in the above image
btw, what do u think about imputation and phasing in the context of admixtools or any ancestry related analysis?
 
Finally have the process documented from using my HG19-aligned 23andme txt file produced last year from when I processed it from FASTQ


Code:
# Step 1: Convert the 23andMe file to PLINK binary format.
plink --23file /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked_23andMe_V3.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked

# Step 2a: Extract SNPs from the Jovialis dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --write-snplist --out /mnt/d/UbuntuJovialisHome/jovialis_snp_list

# Step 2b: Extract SNPs from the AADR (v62.0_HO_public) dataset.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --write-snplist --allow-no-sex --out /mnt/d/UbuntuJovialisHome/v62_snp_list

# Step 3: Find common SNPs between the two datasets (Jovialis and AADR).
comm -12 <(sort /mnt/d/UbuntuJovialisHome/jovialis_snp_list.snplist) <(sort /mnt/d/UbuntuJovialisHome/v62_snp_list.snplist) > /mnt/d/UbuntuJovialisHome/common_snps.txt

# Step 4: Filter the Jovialis dataset to keep only SNPs present in both datasets (common SNPs).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_sorted_marked --extract /mnt/d/UbuntuJovialisHome/common_snps.txt --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_common_snps

# Step 5: Attempt an initial merge and identify problematic SNPs (multiallelic or inconsistent strand).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62_common_snps --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged

# Step 6: Flip problematic SNPs in the Jovialis dataset (fix strand inconsistencies).
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_common_snps --flip /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_flipped

# Step 7: Exclude remaining problematic SNPs from the Jovialis dataset.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/Jovialis_flipped --exclude /mnt/d/UbuntuJovialisHome/v62_Jovialis_merged-merge.missnp --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered

# Step 8: Filter the Jovialis dataset to keep only SNPs present in AADR.
plink --bfile /mnt/d/UbuntuJovialisHome/Jovialis_PLINK_binary --extract /mnt/d/UbuntuJovialisHome/v62.0_HO_public.bim --make-bed --out /mnt/d/UbuntuJovialisHome/Jovialis_filtered_for_AADR

# Step 9: Perform the final merge, ensuring all SNPs from AADR are kept and only SNPs matching AADR from Jovialis are merged.
plink --allow-no-sex --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --bmerge /mnt/d/UbuntuJovialisHome/Jovialis_filtered_cleaned --make-bed --out /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final

# Step 10a: Check SNP frequency in the final merged dataset to verify all SNPs from AADR were retained.
plink --bfile /mnt/d/UbuntuJovialisHome/v62_Jovialis_corrected_final --freq --out snp_check

# Step 10b: Check SNP frequency in the original AADR dataset for comparison.
plink --bfile /mnt/d/UbuntuJovialisHome/v62.0_HO_public --freq --out aadr_check

GAybZqw.png
VERY IMPORTANT UPDATE:

I was in the process of re-following my guide to merge my WGS30x sample with AADR 1240K, and I found a couple mistakes that were made by the stupid AI hallucinating part of the process.

Allow-no-sex must was added to step 2b. (you can add it to 2a, but you can just change the FAM accordingly to your proper sex)

Also, and this is crucial, I eliminated 4b: You absolutely do NOT want to filter the AADR to common SNPs found between your sample and AADR. Your sample must defer to AADR!

I have modified the original post accordingly. Apologies for any inconvenience this may have caused people.
 
Back
Top