Detecting IBD within low coverage ancient DNA data. Development Repository for software package that contains code for manuscript. - GitHub - hringbauer/ancIBD: Detecting IBD within low coverage an...
github.com
So who is the genius that is going to figure this one out?
Hmm, looks like you first need to convert aDNA files to VCF format, and impute them with GLIMPES, which a pretty massive first step.
FASTQ to SAM to BAM to Sorted-BAM to VCF is a pretty large undertaking for just a single file.
Thus, the ability for us to get ahead of the curve, and figure out which samples are best to use before they even come out in papers is possible. But it would require a herculean undertaking by several members of the community, merely just to convert the samples to VCF.
Though most samples come as sorted BAMs already on ENA. So to endeavor to do a wide-scale project is feasible. If it is FASTQ, forget it.
Hi Jovialis, here is a link with the Imputed VCF and Eigenstrat Antonio_2019 (includes a non imputed VCF called with the "1000 Genomes"), but they are not GLIMPES ..they are Beagles Imputed. The VCF contains all the samples, but they can be extracted individually, ... or convert the VCF to plink and extract each sample to VCF. The Link: (after download ... untar, then navigate and untar again)
I converted the Imputed VCF into a "Reich" 1240k and HO Eigenstrat/Plink, ... with 1240K: about 96% coverage per sample ... I played around with it: converting, extracting, merging, ... Here is the Link:
MediaFire is a simple to use free service that lets you put all your photos, documents, music, and video in a single place so you can access them anywhere and share them everywhere.
some example:
... after filtering: VCFs Head Editing:
bcftools head Romans_1.vcf.gz > 1header.hr
bcftools head Romans_2.vcf.gz > 2header.hr
bcftools head Romans_3.vcf.gz > 3header.hr
...and so on
Edit all Heads with the one below (copy and paste), replace ##contig=<ID=1> to 2header.hr with ##contig=<ID=2> save it, ...and so on.
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=30/10/2021 - 05:10:05
##source=GLIMPSE_phase v1.0.0
##contig=<ID=1>
##INFO=<ID=RAF,Number=A,Type=Float,Description="ALT allele frequency in the reference panel">
##INFO=<ID=AF,Number=A,Type=Float,Description="ALT allele frequency computed from DS/GP field across target samples">
##INFO=<ID=INFO,Number=A,Type=Float,Description="Imputation information or quality score">
##INFO=<ID=BUF,Number=A,Type=Integer,Description="Is it a variant site falling within buffer regions? (0=no/1=yes)">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Unphased genotypes">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype dosage">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Genotype posteriors">
##FORMAT=<ID=HS,Number=1,Type=Integer,Description="Sampled haplotype pairs packed into intergers (max: 16 pairs, see NMAIN header line)">
##NMAIN=15
##INFO=<ID=pan_troglodytes,Number=1,Type=String,Description="allele observed in pan_troglodytes">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (high-quality bases)">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=MASK_1000G,Number=0,Type=Flag,Description="SNP is in 1000G strict mask region">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
##bcftools_viewVersion=1.13+htslib-1.13
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R15 R16 R17 R18 R19 R22 R24 R25 R26 R27 R28 R29 R30 R31 R32 R33 R34 R35 R36 R37 R38 R39 R40 R41 R42 R43 R44 R45 R47 R49 R50 R51 R52 R53 R54 R55 R56 R57 R58 R59 R60 R61 R62 R63 R64 R65 R66 R67 R68 R69 R70 R71 R72 R73 R75 R76 R78 R80 R81 R104 R105 R106 R107 R108 R109 R110 R111 R113 R114 R115 R116 R117 R118 R120 R121 R122 R123 R125 R126 R128 R130 R131 R132 R133 R134 R136 R137 R435 R436 R437 R473 R474 R475 R835 R836 R850 R851 R969 R970 R973 R1014 R1015 R1016 R1021 R1219 R1220 R1221 R1224 R1283 R1285 R1286 R1287 R1288 R1289 R1290 R1543 R1544 R1545 R1547 R1548 R1549 R1550 R1551
Now we replace the Heads:
bcftools reheader -h 1header.hr Romans_1.vcf.gz > Romans_chr1.vcf.gz
bcftools reheader -h 2header.hr Romans_2.vcf.gz > Romans_chr2.vcf.gz
bcftools reheader -h 3header.hr Romans_3.vcf.gz > Romans_chr3.vcf.gz
... and so on
Only ITS5 had a sub-optimal p-vaule, but the other metrics looked good. Here are modern Italian populations using C_Italian_N and C_Italian_ChL, along with Steppe_EMBA, and Iran_N_CHG. Since TSI and the North can be modeled with WHG, it works with the ChL central Italian sample since there was...
Only ITS5 had a sub-optimal p-vaule, but the other metrics looked good. Here are modern Italian populations using C_Italian_N and C_Italian_ChL, along with Steppe_EMBA, and Iran_N_CHG. Since TSI and the North can be modeled with WHG, it works with the ChL central Italian sample since there was...
MediaFire is a simple to use free service that lets you put all your photos, documents, music, and video in a single place so you can access them anywhere and share them everywhere.
www.mediafire.com
The ch_all_C_Ita_N-C_Ita_ChL file has the relations of R6 R10 R2 R3 R8 R9 R19 R16 R17 R18 R4 R5 R1014 with the R samples, ... the ch_all_Antonio_2019 file has the complete relations of all the samples.
Hi Jovialis,
I found the link of the IBD study and they provide an ancient-samples V54.1file, but it just contains the chromosome matches between the samples. Not all samples passed their quality filters. I sorted the file by the chromosomes, then split it into 22 files, and then I ran the ancIBD. I am providing the link below. I also found an already processed ancIBD for Eurasia and have included that below as well. I used one of your files as info about the V54.1 sample.
ancIBD identifies identity-by-descent regions in ancient DNA using a hidden Markov model optimized for these low-coverage data. Analysis of 4,248 individuals demonstrates that ancIBD can identify up to sixth-degree relatives and provides genealogical insights into ancient populations.
MediaFire is a simple to use free service that lets you put all your photos, documents, music, and video in a single place so you can access them anywhere and share them everywhere.
Hi Jovialis,
I found the link of the IBD study and they provide an ancient-samples V54.1file, but it just contains the chromosome matches between the samples. Not all samples passed their quality filters. I sorted the file by the chromosomes, then split it into 22 files, and then I ran the ancIBD. I am providing the link below. I also found an already processed ancIBD for Eurasia and have included that below as well. I used one of your files as info about the V54.1 sample.
ancIBD identifies identity-by-descent regions in ancient DNA using a hidden Markov model optimized for these low-coverage data. Analysis of 4,248 individuals demonstrates that ancIBD can identify up to sixth-degree relatives and provides genealogical insights into ancient populations.
MediaFire is a simple to use free service that lets you put all your photos, documents, music, and video in a single place so you can access them anywhere and share them everywhere.
I uploaded the v54.1 (Reich) AncIBD (Post #12).
It takes some time to go through the samples … sorting.
The V54_ibd_ind.tsv file is a summary of closer relations, the V54_ch_all.tsv file could contain Ancestral connections, I Think : )
I merged and processed all the Ancient Italians in the v54.1 and the Allentoft.
I extracted 220 ibd files and 306 ch files, each file is dedicated to a sample ... kind of a chr. deep dive.
... the download link:
MediaFire is a simple to use free service that lets you put all your photos, documents, music, and video in a single place so you can access them anywhere and share them everywhere.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.