Forum | Europe Travel Guide | Ecology | Facts & Trivia | Genetics | History | Linguistics |
Austria | France | Germany | Ireland | Italy | Portugal | Spain | Switzerland |
![]() |
The only way to do it is from Linux , I don't think think there is Windows version, at least I don't use it.
I use Ubuntu VM for the conversion. The command is
convertf -p parameters_file
In parameters_file you specify the filenames that should be converted and the output file names.
In addition: the latest datasets are huge, so you computer should be with lot of RAM, I use a laptop with 16 GB .
I think it can be done from Windows as well. I did it with Plink2.
Sadly I lost access to my old discord where a couple of fellows helped me pull the plink thing off. I basically converted my raw to plink, converted the dataset to plink, merged them, and ran the script off of the new plink dataset. To be fair I must have f up somewhere, since the results of the runs were quite out there, but the people helping me out were quite sure I did the steps right.
I'll try to get in touch with them, they might have the screenshots of the steps/code on their end.
Teaches me to keep better documentation...
Edit: Some of the code is on post95 in this thread, might help, but I think that version had some errors. I realized I did not really make use of convertf... at least from that screen.
Anyways some of the resources I used:
https://zzz.bwh.harvard.edu/plink/dataman.shtml
https://www.cog-genomics.org/plink/
“Man cannot live without a permanent trust in something indestructible in himself, and at the same time that indestructible something as well as his trust in it may remain permanently concealed from him.”
― Franz Kafka
Yes, both convertions are possible. The templates for parameter files are provided with Eigensoft:
https://github.com/argriffing/eigens...aster/CONVERTF
echo "
genotypename: full230.geno
snpname: full230.snp
indivname: full230.ind
outputformat: PACKEDPED
genotypeoutname: full230.bed
snpoutname: full230.bim
indivoutname: full230.fam
" >par_full230.par
convertf -p par_full230.par
genotypename: example.ped
snpname: example.pedsnp # or example.map, either works
indivname: example.pedind # or example.ped, either works
outputformat: EIGENSTRAT
genotypeoutname: example.eigenstratgeno
snpoutname: example.snp
indivoutname: example.ind
familynames: NO
Yeah, had the same problem. Short of getting more storage, you could limit the v54 to the samples you plan to run, which is what I did.
If either you or bgtrak could make a short documentation for using convertf that would be great. I am really curious to see if I messed something up earlier, cause my results were pretty wild.
Actually - you don`t need that much 100GB for the conversion. But you need lot of RAM on you computer. The conversion may take some time, may be half an hour or more. On slow computer it may not run at all or may freeze.The result from geno to ped will generate the same size file , near 5 GB . However if you Linux is using lot of swap space - there is your weakness. Make sure to alocate lot or RAM as it is needed. And do the conversion on the most powerful PC you have.
It's not a big deal, messing around with plink with the inconsistencies errors is more ball busting.
Download something like Ubuntu WSL for windows, then download convertf from github, then just use the command
par.EIGENSTRAT.ped is the par file with the instructions for the files in your directory.Code:convert -f par.EIGENSTRAT.ped
Mine looked like that for example:
This is how you can convert your original 1240K or HO file into plink format.Code:genotypename: dosasv54merged.geno snpname: dosasv54merged.snp indivname: dosasv54merged.ind outputformat: PED genotypeoutname: dosasv54merged.ped snpoutname: dosasv54merged.pedsnp indivoutname: dosasv54merged.pedind
After you do you can use the following command to convert to .bed, which I do, then merged it with your own .bed files, that are generated by plink using your own dna file(s).
The above then will turn the .ped files into .bed, then you can just go ahead and merge with plink again using --bmerge.Code:plink --file dosasv54merged --make-bed --out dosasv54merged
For example:
Code:plink --allow-no-sex --bfile dosasv54merged --bmerge newfiletobemerged --out dosasv54merged2
Keep in mind that you can use PLINK file formats with admixtools2 on R-studio, so no need to re-convert back to .geno if you cba.
Thanks bro, will give it a try. Hopefully I wont get nonsensical results this time.
This is not a good idea to use *.ped as extention for the parameter file. You may use .txt or .par instead.
.ped is used for packed eisenstrat file format.. Just try to avoid that kind of mess. In the original software from git hub or downlowad there are excelent examples for parameter files, that's why I did not provide more on that .
I should have mentioned that if you want to edit these files, you can just go ahead and download Visual Studio Code for windows.
https://code.visualstudio.com/
Otherwise, just do what bgtrack says.
I would like to add that people can use plink files perfectly fine with admixtools2 on R-studio, one of the perks of the 2nd ed.
.bed would be the equivalent to .geno, .bim to .snp and .fam to .ind.
The process is exactly the same.
The only you thing you need is a code viewer like visualcodestudio, which I use, to read and edit the index .fam file.
Finally got a scientifically viable model for myself:
p>0.05; se<0.05.
Had to read some academic paper on right tails to get to it though.
Model stands up for other populations, but the slavic proxy or the tail might need optimizing for some of them.
^Middle Ages Doclea(Montenegro) samples, substantially Slavic per G25.
^Two of the most admixed 1800s Albanian samples from the Southern Arc. Middle Age samples and the other non-admixed moderns completely fail with any of the proxies. More or less in line with G25 models for these samples I would say, albeit on the lower side.
I really am looking forward to the Kuline samples from the Danubian Limes being published, hoping they are high quality samples, that can be used to model early Slavic component in the whole Balkans.
right = c("Ethiopia_4500BP", "China_Tianyuan", "Yoruba.DG", "Serbia_IronGates_Mesolithic","Lithuania_EMN_N arva ", "Turkey_Barcin_LN.SG", "Russia_Steppe_Eneolithic",
"Israel_C", "Iran_GanjDareh_N", "Russia_Samara_EBA_Yamnaya", "Turkey_Alalakh_MLBA", "Moldova_MBA_Catacomb",
"Greece_BA_Mycenaean", "Slovenia_EIA", "Netherlands_EIA","Bulgaria_EIA", "Russia_IA_Ingria.SG", "Sudan_EarlyChristian" , "Spain_Greek_oLocal")
So in reality, this is guided by the Cosmopolitanism at the Roman Danubian Frontier, Slavic Migrations, and the Genomic Formation of Modern Balkan Peoples papers tail.
The only additions I made in order to lower standard errors are the Moldovan Catacomb, the Lithuanian EMN, and Bulgaria_EIA. Which are in line with suggestion about improving right tails from Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture.
I could not find the documentation for the exact sample names they used, but I think the guesswork for stuff like Steppe_IA, Anatolia_N etc. should be good enough. Also Ethiopia 4500bp instead of West Africa Ancient, since I had no idea what that sample was referring to.
I wonder if it will work for you. For some of the Albanian samples in the HO dataset it did not work, and for some others it would work with minor tweaks to the Slavic proxy.
It works for me as well, good job.
39% Empuries2 + 52.7% Armenia_LBA + 8.36% Slav invader. Tail is 5.39% and s.e. almost at 5%.
Code:[COLOR=#06989A !important]Reading metadata...[/COLOR] [COLOR=#06989A !important]ℹ Computing block lengths for 1150639 SNPs...[/COLOR] [COLOR=#06989A !important]ℹ Computing 57 f4-statistics for block 713 out of 713...[/COLOR] [COLOR=#06989A !important]ℹ "allsnps = TRUE" uses different SNPs for each f4-statistic[/COLOR] . . . [COLOR=#06989A !important]ℹ Computing admixture weights...[/COLOR] [COLOR=#06989A !important]ℹ Computing standard errors...[/COLOR] [COLOR=#06989A !important]ℹ Computing number of admixture waves...[/COLOR] warning: solve(): system is singular (rcond: 2.0141e-17); attempting approx solution > results$weights [COLOR=#949494 !important]# A tibble: 3 × 5[/COLOR] target left weight se z [COLOR=#949494 !important]<chr>[/COLOR] [COLOR=#949494 !important]<chr>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#BCBCBC !important]1[/COLOR] dosas Spain_Hellenistic_Emporion 0.390 0.0636 6.12 [COLOR=#BCBCBC !important]2[/COLOR] dosas Armenia_LBA.SG 0.527 0.0739 7.13 [COLOR=#BCBCBC !important]3[/COLOR] dosas AV2 0.0836 0.0467 1.79 > results$popdrop [COLOR=#949494 !important]# A tibble: 7 × 14[/COLOR] pat wt dof chisq p f4rank Spain_Hellen…¹ Armen…² AV2 feasi…³ best dofdiff [COLOR=#949494 !important]<chr>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#949494 !important]<lgl>[/COLOR] [COLOR=#949494 !important]<lgl>[/COLOR] [COLOR=#949494 !important]<dbl>[/COLOR] [COLOR=#BCBCBC !important]1[/COLOR] 000 0 17 27.3 5.39[COLOR=#949494 !important]e[/COLOR][COLOR=#CC0000 !important]- 2[/COLOR] 2 0.390 0.527 0.0836 TRUE [COLOR=#CC0000 !important]NA[/COLOR] [COLOR=#CC0000 !important]NA[/COLOR]