admixtools2 TUTORIAL for WINDOWS.

Some instructions and examples are available here:


Are you familiar with the convertf command to convert the Reich dataset files into PLINK format?
 
Can you give me the command for windows command prompt?

The only way to do it is from Linux , I don't think think there is Windows version, at least I don't use it.
I use Ubuntu VM for the conversion. The command is

convertf -p parameters_file

In parameters_file you specify the filenames that should be converted and the output file names.
In addition: the latest datasets are huge, so you computer should be with lot of RAM, I use a laptop with 16 GB .
 
The only way to do it is from Linux , I don't think think there is Windows version, at least I don't use it.
I use Ubuntu VM for the conversion. The command is

convertf -p parameters_file

In parameters_file you specify the filenames that should be converted and the output file names.
In addition: the latest datasets are huge, so you computer should be with lot of RAM, I use a laptop with 16 GB .

I think it can be done from Windows as well. I did it with Plink2.

Sadly I lost access to my old discord where a couple of fellows helped me pull the plink thing off. I basically converted my raw to plink, converted the dataset to plink, merged them, and ran the script off of the new plink dataset. To be fair I must have f up somewhere, since the results of the runs were quite out there, but the people helping me out were quite sure I did the steps right.

I'll try to get in touch with them, they might have the screenshots of the steps/code on their end.
Teaches me to keep better documentation...

Edit: Some of the code is on post95 in this thread, might help, but I think that version had some errors. I realized I did not really make use of convertf... at least from that screen.


Anyways some of the resources I used:
https://zzz.bwh.harvard.edu/plink/dataman.shtml
https://www.cog-genomics.org/plink/
 
I think it can be done from Windows as well. I did it with Plink2.

plink is available for both Linux ana Windows OS. However the question was for convertf.
So convertf is Linux only. There are also other tools to convert from one format to another, but convertf is the easiest ways.
 
plink is available for both Linux ana Windows OS. However the question was for convertf.
So convertf is Linux only. There are also other tools to convert from one format to another, but convertf is the easiest ways.

I see. I will give it a try this way from a VM, since using the method I described gave me quite weird results. Thanks for the tip.
 
The only way to do it is from Linux , I don't think think there is Windows version, at least I don't use it.
I use Ubuntu VM for the conversion. The command is

convertf -p parameters_file

In parameters_file you specify the filenames that should be converted and the output file names.
In addition: the latest datasets are huge, so you computer should be with lot of RAM, I use a laptop with 16 GB .


Yeah, I use it on Ubuntu WSL too.

Do you convert the Reich dataset to plink format (.ped/.bed) and then merge or convert your raw dna file into eigenstrat and them merge both in eigenstrat?
 
Yeah, I use it on Ubuntu WSL too.

Do you convert the Reich dataset to plink format (.ped/.bed) and then merge or convert your raw dna file into eigenstrat and them merge both in eigenstrat?

Yes, both convertions are possible. The templates for parameter files are provided with Eigensoft:

https://github.com/argriffing/eigensoft/tree/master/CONVERTF

echo "
genotypename: full230.geno
snpname: full230.snp
indivname: full230.ind
outputformat: PACKEDPED
genotypeoutname: full230.bed
snpoutname: full230.bim
indivoutname: full230.fam
" >par_full230.par


convertf -p par_full230.par


genotypename: example.ped
snpname: example.pedsnp # or example.map, either works
indivname: example.pedind # or example.ped, either works
outputformat: EIGENSTRAT
genotypeoutname: example.eigenstratgeno
snpoutname: example.snp
indivoutname: example.ind
familynames: NO
 
I see. I will give it a try this way from a VM, since using the method I described gave me quite weird results. Thanks for the tip.


Ι used plink on windows cmd for plink stuff.

Ubuntu wsl for admixtools1 and convertf.

The only rough spot is having 100gb available for the conversion of the v54.1_1240K eigenstrat into .ped.
 
Ι used plink on windows cmd for plink stuff.

Ubuntu wsl for admixtools1 and convertf.

The only rough spot is having 100gb available for the conversion of the v54.1_1240K eigenstrat into .ped.

Yeah, had the same problem. Short of getting more storage, you could limit the v54 to the samples you plan to run, which is what I did.

If either you or bgtrak could make a short documentation for using convertf that would be great. I am really curious to see if I messed something up earlier, cause my results were pretty wild.
 
Ι used plink on windows cmd for plink stuff.

Ubuntu wsl for admixtools1 and convertf.

The only rough spot is having 100gb available for the conversion of the v54.1_1240K eigenstrat into .ped.

Actually - you don`t need that much 100GB for the conversion. But you need lot of RAM on you computer. The conversion may take some time, may be half an hour or more. On slow computer it may not run at all or may freeze.The result from geno to ped will generate the same size file , near 5 GB . However if you Linux is using lot of swap space - there is your weakness. Make sure to alocate lot or RAM as it is needed. And do the conversion on the most powerful PC you have.
 
Yeah, had the same problem. Short of getting more storage, you could limit the v54 to the samples you plan to run, which is what I did.

If either you or bgtrak could make a short documentation for using convertf that would be great. I am really curious to see if I messed something up earlier, cause my results were pretty wild.


It's not a big deal, messing around with plink with the inconsistencies errors is more ball busting.

Download something like Ubuntu WSL for windows, then download convertf from github, then just use the command

Code:
convert -f par.EIGENSTRAT.ped

par.EIGENSTRAT.ped is the par file with the instructions for the files in your directory.

Mine looked like that for example:

Code:
genotypename:    dosasv54merged.geno
snpname:         dosasv54merged.snp
indivname:       dosasv54merged.ind
outputformat:    PED
genotypeoutname: dosasv54merged.ped
snpoutname:      dosasv54merged.pedsnp
indivoutname:    dosasv54merged.pedind

This is how you can convert your original 1240K or HO file into plink format.

After you do you can use the following command to convert to .bed, which I do, then merged it with your own .bed files, that are generated by plink using your own dna file(s).

Code:
plink --file dosasv54merged --make-bed --out dosasv54merged

The above then will turn the .ped files into .bed, then you can just go ahead and merge with plink again using --bmerge.

For example:

Code:
plink --allow-no-sex --bfile dosasv54merged --bmerge newfiletobemerged --out dosasv54merged2
 
Keep in mind that you can use PLINK file formats with admixtools2 on R-studio, so no need to re-convert back to .geno if you cba.
 
Thanks bro, will give it a try. Hopefully I wont get nonsensical results this time.
 
Thanks bro, will give it a try. Hopefully I wont get nonsensical results this time.

Sorry, the convertf command is wrong, the correct one is:

Code:
convertf -p par.EIGENSTRAT.ped

PS. If you get incos. errors with plink, let me know and I'll give you the rundown of how to bypass it.
 
Sorry, the convertf command is wrong, the correct one is:

Code:
convertf -p par.EIGENSTRAT.ped

PS. If you get incos. errors with plink, let me know and I'll give you the rundown of how to bypass it.

This is not a good idea to use *.ped as extention for the parameter file. You may use .txt or .par instead.
.ped is used for packed eisenstrat file format.. Just try to avoid that kind of mess. In the original software from git hub or downlowad there are excelent examples for parameter files, that's why I did not provide more on that .
 
This is not a good idea to use *.ped as extention for the parameter file. You may use .txt or .par instead.
.ped is used for packed eisenstrat file format.. Just try to avoid that kind of mess. In the original software from git hub or downlowad there are excelent examples for parameter files, that's why I did not provide more on that .


I should have mentioned that if you want to edit these files, you can just go ahead and download Visual Studio Code for windows.


https://code.visualstudio.com/


Otherwise, just do what bgtrack says.
 
I would like to add that people can use plink files perfectly fine with admixtools2 on R-studio, one of the perks of the 2nd ed.

.bed would be the equivalent to .geno, .bim to .snp and .fam to .ind.

The process is exactly the same.

The only you thing you need is a code viewer like visualcodestudio, which I use, to read and edit the index .fam file.
 

This thread has been viewed 29943 times.

Back
Top