Admixtools Using Admixtools2 to model admixture

Tautalus · Jan 19, 2024

In the previous post I said that the trickiest part of all this is the merging of our own data with the Reich's data.
I was wrong, software is easy, the trickiest part is finding the best combinations of populations for qpAdm.
It takes tests and more tests to find them, luckily I had studies to base myself on.
For this analysis I was based mainly on the study The genomic history of the Iberian Peninsula in the last 8000 years by Olalde, 2019.

So far the best values I've found for myself are these:
51% from Portugal EBA (a subset I analyzed separately)
30% from a population with France_Beaker-like ancestry (probably related to the LBA Urnfield culture)
14% from Italy_Imperial (central/eastern Mediterranean ancestry arrived during the Roman Empire)
3% Iberomaurusian

Code:

target = 'Tautalus'
left= c('Portugal_EBA','France_BellBeaker','Italy_Imperial_o5.SG','Morocco_Iberomaurusian_TAF012')
right = c('Mbuti.DG','Ethiopia_4500BP.SG' ,'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1','France_NouvelleAquitaine_Mesolithic.SG', 'Italy_North_Villabruna_HG', 'Karitiana.DG', 'Papuan.DG', 'Iran_GanjDareh_N', 'Russia_Boisman_MN',  'Czech_CordedWare', 'Netherlands_EIA', 'Turkey_Arslantepe_LateC', 'Israel_C', 'ONG.SG')
results = qpadm(prefix, left, right, target, allsnps = TRUE)
results$weights
results$popdrop

Portugal EBA is made up of 68% of Chalcolithic Portugal and 32% of Germany BellBeaker, which should represent the main ancestry of the Lusitanian peoples. I used the same reference populations.

Who Cares? · Jan 21, 2024

Tautalus said:
The trickiest part of all this is the merging of our own data with the data from the Reich Lab.

There are several ways of doing it, here is one of the fastest and simplest.

It’s the conversion of your raw data in 23andme format to Bed format, then to Geno format, then merging it with the Reich data.

All this instructions assume that all the programs and packages needed for the execution in DOS or in Wsl are already installed.
The names of the files can be whatever you want.

1) Convert raw data in 23andme format to bed file (DOS session)
plink --allow-no-sex --alleleACGT --23file 23andMe.txt --make-bed --out outfile

This will produce 3 essential files, a bed, a bim and a fam file. In the fam file you could replace the -9 for 1.

2) Convert the bed file to geno (Eigenstrat format) (Wsl session)
You need to have a parameter file, with whatever name you want. I name it par.BED2GENO.par, its content are :

genotypename: outfile.bed
snpname: outfile.bim
indivname: outfile.fam
outputformat: EIGENSTRAT
genotypeoutname: outfile.geno
snpoutname: outfile.snp
indivoutname: outfile.ind

After the parameter file is done execute the command : convertf -p par.BED2GENO.par
This will produce 3 files, a geno, a ind and a snp file. In the ind file you can replace “Control” by your own name or alias.

3) Merge your data with the Reich data (In Wsl)
You need to have a parameter file. I name it par.MERGEGENO.par, its content :

geno1: outfile.geno
snp1: outfile.snp
ind1: outfile.ind
geno2: v54.1.p1_1240K_public.geno
snp2: v54.1.p1_1240K_public.snp
ind2: v54.1.p1_1240K_public.ind
genooutfilename: merged.geno
snpoutfilename: merged.snp
indoutfilename: merged.ind
outputformat: EIGENSTRAT
docheck: YES
hashcheck: YES
strandcheck: YES

Then execute the command : mergeit -p par.MERGEGENO.par

Mergeit, according to the documentation, merges two data sets into a third, which has the union of the individuals and the intersection of the SNPs in the first two, which means that the final merged file will only have the SNPs that exist in both files and all the remaining SNPs will be discarded. This will produce a merged file smaller in number of SNPs, not in size, than the original Reich data, with all the info you need to model your admixture.

I compared the test results of this file with the test results of a merged file with all of Reich's data plus my data and they are identical.
This merge is the process that takes the longest, between half hour and an hour depending on the computer.

And that's it, after this process the merged files are ready to be used by qpadm. Now you have two datasets, one is the original Reich data, to test all the populations and the other is your merged files, to test your admixture.

Assuming you used Linux here for step 2, as only tool I could find is this one:

GitHub - argriffing/eigensoft: principal components population genetics analysis on linux

principal components population genetics analysis on linux - argriffing/eigensoft

github.com

and

GitHub - DReichLab/EIG: Eigen tools by Nick Patterson and Alkes Price lab

Eigen tools by Nick Patterson and Alkes Price lab. Contribute to DReichLab/EIG development by creating an account on GitHub.

github.com

Tautalus · Jan 21, 2024

Yes, WSL is Windows Subsystem for Linux.
Just run these commands to install convertf and mergeit :
sudo apt update
sudo apt -y install eigensoft

Jovialis · Jan 21, 2024

Working in Ubuntu WSL (or any terminal for that matter, like Powershell) was extremely alien and esoteric to me. But ever since I've utilized AI to help me with it, I feel excited to use it, and it is comfortable.

Who Cares? · Jan 21, 2024

Tautalus said:
Yes, WSL is Windows Subsystem for Linux.
Just run these commands to install convertf and mergeit :
sudo apt update
sudo apt -y install eigensoft

Tried running both vcf file and 23 and me, but it would not merge properly for me (merged.geno file smaller than v54.1.p1_HO_public.geno)

Ivorix · Jan 22, 2024

Someone did mine too:

59.4 AHG
23.0 EHG
13.4 CHG
3.6 WHG
0.6 Iran_N

------------------------

55.2 EEF
38.0 Yamnaya
6.8 WHG

Tautalus · Jan 22, 2024

I merged two different raw data files, one from 23andMe which ended up being smaller than the Reich file, and another from Ancestry (in 23andMe format) which ended up being bigger, that was the one I was referring to in the post when I said that it got bigger, in size, than the Reich file.
And it's bigger because it has many more SNPs in common with the Reich file, 379784 SNPs. Its the one I normally use with qpAdm.
My 23andMe file only had 137289 SNPs in common with the Reich file, which is why it is smaller in size. So it's normal.
If you have a file from Ancestry its better, if you only have one from 23andMe, then it's just those SNPs that qpAdm will work with when you're modeling your admixture.

Who Cares? · Jan 22, 2024

Ivorix said:
Someone did mine too:

59.4 AHG
23.0 EHG
13.4 CHG
3.6 WHG
0.6 Iran_N

------------------------

55.2 EEF
38.0 Yamnaya
6.8 WHG

I just want to do it so that we can end the stupid debate I had with some Albanian members in another thread.
Somebody there claimed how South Slavs living in areas near Albania (west parts of North Macedonia and southeastern Serbia) killed Balkan population upon their arrival and that South Slavs today don't have significant genetic heritage from those native Balkan people (lets call it like that) living prior to the appearance of Slavs.

Who Cares? · Jan 22, 2024

Tautalus said:
I merged two different raw data files, one from 23andMe which ended up being smaller than the Reich file, and another from Ancestry (in 23andMe format) which ended up being bigger, that was the one I was referring to in the post when I said that it got bigger, in size, than the Reich file.
And it's bigger because it has many more SNPs in common with the Reich file, 379784 SNPs. Its the one I normally use with qpAdm.
My 23andMe file only had 137289 SNPs in common with the Reich file, which is why it is smaller in size. So it's normal.
If you have a file from Ancestry its better, if you only have one from 23andMe, then it's just those SNPs that qpAdm will work with when you're modeling your admixture.

I have 95% coverage VCF file with ~2 million SNPs and about 400 MB in size.

I tried running:
plink --allow-no-sex --alleleACGT --vcf input.vcf.gz --make-bed --out outfile
but it did not work well, so I added --aec flag to allow extended chromosomes, and I also added to go over from 1 to 22 chromosome pairs only. This provided me with working files, but when I ran mergeit I noticed the size is smaller.

I also tried running 23andMe file and it did the same thing when I ran mergeit.

Tautalus · Jan 22, 2024

Who Cares? said:
I have 95% coverage VCF file with ~2 million SNPs and about 400 MB in size.

I tried running:
plink --allow-no-sex --alleleACGT --vcf input.vcf.gz --make-bed --out outfile
but it did not work well, so I added --aec flag to allow extended chromosomes, and I also added to go over from 1 to 22 chromosome pairs only. This provided me with working files, but when I ran mergeit I noticed the size is smaller.

I also tried running 23andMe file and it did the same thing when I ran mergeit.

Open the snp file with a text editor and check how many SNPs it has.
Mergeit only joins the common SNPs in both files.

I haven't worked with VCF files yet, I don't know the structure.
Have you tried converting the VCF to 23andme and working with that file?
You can do this, for example, with the DNA Kit Studio from DNAGenics.

Who Cares? · Jan 22, 2024

Tautalus said:
Open the snp file with a text editor and check how many SNPs it has.
Mergeit only joins the common SNPs in both files.

I haven't worked with VCF files yet, I don't know the structure.
Have you tried converting the VCF to 23andme and working with that file?
You can do this, for example, with the DNA Kit Studio from DNAGenics.

As I said, I tried with 23andme file as well. I can convert BAM file to 23andme format with WGS Extract, and from there I tried using 23andme, but I also had the same problem with merged files being smaller than Reich's file.
I'll deal with it in one of the upcoming days if I manage to find some time and let you know what was the outcome.

Tautalus · Jan 22, 2024

Who Cares? said:
As I said, I tried with 23andme file as well. I can convert BAM file to 23andme format with WGS Extract, and from there I tried using 23andme, but I also had the same problem with merged files being smaller than Reich's file.
I'll deal with it in one of the upcoming days if I manage to find some time and let you know what was the outcome.

Ok, I assumed the VCF and 23andme were two different files from two different companies.

One thing I did to confirm that Mergeit worked well was to import all of Reich's SNPs and those in my files into an Access database and validate the common SNPs between them with SQL queries.

Admixtools Using Admixtools2 to model admixture

Tautalus

Regular Member

Who Cares?

Junior Member

GitHub - argriffing/eigensoft: principal components population genetics analysis on linux

GitHub - DReichLab/EIG: Eigen tools by Nick Patterson and Alkes Price lab

Tautalus

Regular Member

Jovialis

Advisor

Who Cares?

Junior Member

Ivorix

Regular Member

Tautalus

Regular Member

Who Cares?

Junior Member

Who Cares?

Junior Member

Tautalus

Regular Member

Who Cares?

Junior Member

Tautalus

Regular Member