G25 g25 VS qpadm/admixtools2 comparison.

eupator

destroyer of delusions
Messages
509
Reaction score
284
Points
63
Ethnic group
Rhōmaiōs (Rumelia + Anatolia)
Had some time to kill so here goes.

These are my best ‘mixed mode’ results for each period for G25 Illustrative DNA. The raw dna file that was used to produce these results was an extract from my Whole Genome Sequence Nebula Genomics file, that has 99.9% SNP coverage with eurogenes’ template, meaning it is the best possible that could have been used.

Screenshot-2023-12-21-at-13-20-28-ILLUSTRATIVE-DNA.png

Screenshot-2023-12-21-at-13-20-19-ILLUSTRATIVE-DNA.png





Now, I tried to recreate all these models on qpadm/admixtools2 to check for the validity of my results. The reference samples that were used were the exact same ones used by illustrative DNA, meaning for “Corded Ware Culture” I used: N44, N45, pcw040, pcw041, pcw061, pcw070, pcw211, pcw350, pcw361. The same goes for the rest of the references. The raw dna file that was merged to the Reich dataset was also the same that was used to produce the g25 results.

These are the qpadm/admixtools2 results, using a fairly ‘easy’ right list to give the model some breathway.

Code:
> right = c('Mbuti.DG', 'Iran_N', 'Natufian', 'Iberomaurusian', 'Russia_AfontovaGora3', 'Russia_MA1_HG.SG', 'Turkey_Epipaleolithic', 'WHG', 'Iraq_PPN', 'ONG.SG')
> target = c('dosas')
> left = c('Copper_Age_Anatolian','Corded_Ware_Culture')
> results = qpadm(prefix, left, right, target, allsnps = TRUE)
ℹ Reading metadata...
ℹ Computing block lengths for 1150639 SNPs...
ℹ Computing 18 f4-statistics for block 713 out of 713...
ℹ "allsnps = TRUE" uses different SNPs for each f4-statistic
  Number of SNPs used for each f4-statistic:
    pop1                 pop2     pop3                  pop4      n
1  dosas Copper_Age_Anatolian Mbuti.DG        Iberomaurusian 639471
2  dosas Copper_Age_Anatolian Mbuti.DG                Iran_N 613110
3  dosas Copper_Age_Anatolian Mbuti.DG              Iraq_PPN 271162
4  dosas Copper_Age_Anatolian Mbuti.DG              Natufian 139462
5  dosas Copper_Age_Anatolian Mbuti.DG                ONG.SG 665456
6  dosas Copper_Age_Anatolian Mbuti.DG  Russia_AfontovaGora3 155549
7  dosas Copper_Age_Anatolian Mbuti.DG      Russia_MA1_HG.SG 472363
8  dosas Copper_Age_Anatolian Mbuti.DG Turkey_Epipaleolithic 510849
9  dosas Copper_Age_Anatolian Mbuti.DG                   WHG 462873
10 dosas  Corded_Ware_Culture Mbuti.DG        Iberomaurusian 663322
11 dosas  Corded_Ware_Culture Mbuti.DG                Iran_N 630366
12 dosas  Corded_Ware_Culture Mbuti.DG              Iraq_PPN 276211
13 dosas  Corded_Ware_Culture Mbuti.DG              Natufian 140980
14 dosas  Corded_Ware_Culture Mbuti.DG                ONG.SG 708861
15 dosas  Corded_Ware_Culture Mbuti.DG  Russia_AfontovaGora3 156781
16 dosas  Corded_Ware_Culture Mbuti.DG      Russia_MA1_HG.SG 497634
17 dosas  Corded_Ware_Culture Mbuti.DG Turkey_Epipaleolithic 524260
18 dosas  Corded_Ware_Culture Mbuti.DG                   WHG 467622
ℹ Computing admixture weights...
ℹ Computing standard errors...
ℹ Computing number of admixture waves...
> results$weights
# A tibble: 2 × 5
  target left                 weight     se     z
  <chr>  <chr>                 <dbl>  <dbl> <dbl>
1 dosas  Copper_Age_Anatolian  0.622 0.0447 13.9 
2 dosas  Corded_Ware_Culture   0.378 0.0447  8.45
> results$popdrop
# A tibble: 3 × 13
  pat      wt   dof chisq        p f4rank Copper_Age_Anatolian Corded_Ware_Cul…¹ feasi…² best  dofdiff chisq…³ p_nes…⁴
  <chr> <dbl> <dbl> <dbl>    <dbl>  <dbl>                <dbl>             <dbl> <lgl>   <lgl>   <dbl>   <dbl>   <dbl>
1 00        0     8  22.1 4.73e- 3      1                0.622             0.378 TRUE    NA         NA     NA       NA
2 01        1     9 219.  3.41e-42      0                1                NA     TRUE    TRUE        0   -270.       1
3 10        1     9 489.  1.26e-99      0               NA                 1     TRUE    TRUE       NA     NA       NA
# … with abbreviated variable names ¹Corded_Ware_Culture, ²feasible, ³chisqdiff, ⁴p_nested

The same model gives me 62.2% Copper Age Anatolian (4.47% s.e.) and 37.8% Corded Ware Culture (4.47% s.e.), percentages similar to g25 but the p-value is extremely low at 0.00473 meaning the model is a fail (p.value required above 5%).

Now for the Iron Age, where g25 gives me 49% IA Thracian and 51% Etiuni (Caucasus-Trialeti).


Code:
> results = qpadm(prefix, left, right, target, allsnps = TRUE)
ℹ Reading metadata...
ℹ Computing block lengths for 1150639 SNPs...
ℹ Computing 18 f4-statistics for block 713 out of 713...
ℹ "allsnps = TRUE" uses different SNPs for each f4-statistic
  Number of SNPs used for each f4-statistic:
    pop1     pop2     pop3                  pop4      n
1  dosas   Etiuni Mbuti.DG        Iberomaurusian 657869
2  dosas   Etiuni Mbuti.DG                Iran_N 627599
3  dosas   Etiuni Mbuti.DG              Iraq_PPN 275421
4  dosas   Etiuni Mbuti.DG              Natufian 140826
5  dosas   Etiuni Mbuti.DG                ONG.SG 693364
6  dosas   Etiuni Mbuti.DG  Russia_AfontovaGora3 156721
7  dosas   Etiuni Mbuti.DG      Russia_MA1_HG.SG 489370
8  dosas   Etiuni Mbuti.DG Turkey_Epipaleolithic 521589
9  dosas   Etiuni Mbuti.DG                   WHG 467356
10 dosas Thracian Mbuti.DG        Iberomaurusian 641747
11 dosas Thracian Mbuti.DG                Iran_N 616376
12 dosas Thracian Mbuti.DG              Iraq_PPN 272303
13 dosas Thracian Mbuti.DG              Natufian 139943
14 dosas Thracian Mbuti.DG                ONG.SG 668717
15 dosas Thracian Mbuti.DG  Russia_AfontovaGora3 155919
16 dosas Thracian Mbuti.DG      Russia_MA1_HG.SG 474445
17 dosas Thracian Mbuti.DG Turkey_Epipaleolithic 512834
18 dosas Thracian Mbuti.DG                   WHG 464542
ℹ Computing admixture weights...
ℹ Computing standard errors...
ℹ Computing number of admixture waves...
> results$weights
# A tibble: 2 × 5
  target left     weight     se     z
  <chr>  <chr>     <dbl>  <dbl> <dbl>
1 dosas  Thracian  0.542 0.0764  7.10
2 dosas  Etiuni    0.458 0.0764  5.99
> results$popdrop
# A tibble: 3 × 13
  pat      wt   dof chisq        p f4rank Thracian Etiuni feasible best  dofdiff chisqdiff p_nested
  <chr> <dbl> <dbl> <dbl>    <dbl>  <dbl>    <dbl>  <dbl> <lgl>    <lgl>   <dbl>     <dbl>    <dbl>
1 00        0     8  41.7 1.56e- 6      1    0.542  0.458 TRUE     NA         NA      NA         NA
2 01        1     9 318.  4.12e-63      0    1     NA     TRUE     TRUE        0     -74.4        1
3 10        1     9 392.  6.08e-79      0   NA      1     TRUE     TRUE       NA      NA         NA

Qpadm gives me roughly the same percentages 54.2% (s.e. 7.64%) for Thracian and 45.8% (s.e. 7.64%) for Etiuni, but the p-value again is abysmally low at 0.00000156 indicating a fail.


I swap around some stufff and run my own model:

Code:
eading metadata...
ℹ Computing block lengths for 1150639 SNPs...
ℹ Computing 57 f4-statistics for block 713 out of 713...
ℹ "allsnps = TRUE" uses different SNPs for each f4-statistic
  Number of SNPs used for each f4-statistic:
.
.
.
ℹ Computing admixture weights...
ℹ Computing standard errors...
ℹ Computing number of admixture waves...

warning: solve(): system is singular (rcond: 2.0141e-17); attempting approx solution
> results$weights
# A tibble: 3 × 5
  target left                       weight     se     z
  <chr>  <chr>                       <dbl>  <dbl> <dbl>
1 dosas  Spain_Hellenistic_Emporion 0.390  0.0636  6.12
2 dosas  Armenia_LBA.SG             0.527  0.0739  7.13
3 dosas  AV2                        0.0836 0.0467  1.79
> results$popdrop
# A tibble: 7 × 14
  pat      wt   dof chisq        p f4rank Spain_Hellen…¹ Armen…²     AV2 feasi…³ best  dofdiff
  <chr> <dbl> <dbl> <dbl>    <dbl>  <dbl>          <dbl>   <dbl>   <dbl> <lgl>   <lgl>   <dbl>
1 000       0    17  27.3 5.39e- 2      2          0.390   0.527  0.0836 TRUE    NA         NA
It works.

39% Empuries2 + 52.7% Armenia_LBA + 8.36% Slav AV2. Tail is 5.39% and s.e. almost at 5%.

Before someone mentions the obvious, that this is a 3way model, nowhere on illustrative dna I get a result that is even remotely close to the above.


TLDR:

Those who put their blind faith on their g25 results they should keep in mind that those do not correspond necessarily to proper fstats modelling.
 
Very interesting, if you have any spare time could you run greek populations using your 3 way model ?
 
Very interesting, if you have any spare time could you run greek populations using your 3 way model ?
Sure, but keep in mind that the Greek references in the reich dataset are from the HO database and a lot of them seem 'mixed' and not regionally segregated like in those g25 lists.
 
G25 is a worldwide example of confirmation bias.
 
Had some time to kill so here goes.

These are my best ‘mixed mode’ results for each period for G25 Illustrative DNA. The raw dna file that was used to produce these results was an extract from my Whole Genome Sequence Nebula Genomics file, that has 99.9% SNP coverage with eurogenes’ template, meaning it is the best possible that could have been used.

Screenshot-2023-12-21-at-13-20-28-ILLUSTRATIVE-DNA.png

Screenshot-2023-12-21-at-13-20-19-ILLUSTRATIVE-DNA.png





Now, I tried to recreate all these models on qpadm/admixtools2 to check for the validity of my results. The reference samples that were used were the exact same ones used by illustrative DNA, meaning for “Corded Ware Culture” I used: N44, N45, pcw040, pcw041, pcw061, pcw070, pcw211, pcw350, pcw361. The same goes for the rest of the references. The raw dna file that was merged to the Reich dataset was also the same that was used to produce the g25 results.

These are the qpadm/admixtools2 results, using a fairly ‘easy’ right list to give the model some breathway.

Code:
> right = c('Mbuti.DG', 'Iran_N', 'Natufian', 'Iberomaurusian', 'Russia_AfontovaGora3', 'Russia_MA1_HG.SG', 'Turkey_Epipaleolithic', 'WHG', 'Iraq_PPN', 'ONG.SG')
> target = c('dosas')
> left = c('Copper_Age_Anatolian','Corded_Ware_Culture')
> results = qpadm(prefix, left, right, target, allsnps = TRUE)
ℹ Reading metadata...
ℹ Computing block lengths for 1150639 SNPs...
ℹ Computing 18 f4-statistics for block 713 out of 713...
ℹ "allsnps = TRUE" uses different SNPs for each f4-statistic
  Number of SNPs used for each f4-statistic:
    pop1                 pop2     pop3                  pop4      n
1  dosas Copper_Age_Anatolian Mbuti.DG        Iberomaurusian 639471
2  dosas Copper_Age_Anatolian Mbuti.DG                Iran_N 613110
3  dosas Copper_Age_Anatolian Mbuti.DG              Iraq_PPN 271162
4  dosas Copper_Age_Anatolian Mbuti.DG              Natufian 139462
5  dosas Copper_Age_Anatolian Mbuti.DG                ONG.SG 665456
6  dosas Copper_Age_Anatolian Mbuti.DG  Russia_AfontovaGora3 155549
7  dosas Copper_Age_Anatolian Mbuti.DG      Russia_MA1_HG.SG 472363
8  dosas Copper_Age_Anatolian Mbuti.DG Turkey_Epipaleolithic 510849
9  dosas Copper_Age_Anatolian Mbuti.DG                   WHG 462873
10 dosas  Corded_Ware_Culture Mbuti.DG        Iberomaurusian 663322
11 dosas  Corded_Ware_Culture Mbuti.DG                Iran_N 630366
12 dosas  Corded_Ware_Culture Mbuti.DG              Iraq_PPN 276211
13 dosas  Corded_Ware_Culture Mbuti.DG              Natufian 140980
14 dosas  Corded_Ware_Culture Mbuti.DG                ONG.SG 708861
15 dosas  Corded_Ware_Culture Mbuti.DG  Russia_AfontovaGora3 156781
16 dosas  Corded_Ware_Culture Mbuti.DG      Russia_MA1_HG.SG 497634
17 dosas  Corded_Ware_Culture Mbuti.DG Turkey_Epipaleolithic 524260
18 dosas  Corded_Ware_Culture Mbuti.DG                   WHG 467622
ℹ Computing admixture weights...
ℹ Computing standard errors...
ℹ Computing number of admixture waves...
> results$weights
# A tibble: 2 × 5
  target left                 weight     se     z
  <chr>  <chr>                 <dbl>  <dbl> <dbl>
1 dosas  Copper_Age_Anatolian  0.622 0.0447 13.9
2 dosas  Corded_Ware_Culture   0.378 0.0447  8.45
> results$popdrop
# A tibble: 3 × 13
  pat      wt   dof chisq        p f4rank Copper_Age_Anatolian Corded_Ware_Cul…¹ feasi…² best  dofdiff chisq…³ p_nes…⁴
  <chr> <dbl> <dbl> <dbl>    <dbl>  <dbl>                <dbl>             <dbl> <lgl>   <lgl>   <dbl>   <dbl>   <dbl>
1 00        0     8  22.1 4.73e- 3      1                0.622             0.378 TRUE    NA         NA     NA       NA
2 01        1     9 219.  3.41e-42      0                1                NA     TRUE    TRUE        0   -270.       1
3 10        1     9 489.  1.26e-99      0               NA                 1     TRUE    TRUE       NA     NA       NA
# … with abbreviated variable names ¹Corded_Ware_Culture, ²feasible, ³chisqdiff, ⁴p_nested

The same model gives me 62.2% Copper Age Anatolian (4.47% s.e.) and 37.8% Corded Ware Culture (4.47% s.e.), percentages similar to g25 but the p-value is extremely low at 0.00473 meaning the model is a fail (p.value required above 5%).

Now for the Iron Age, where g25 gives me 49% IA Thracian and 51% Etiuni (Caucasus-Trialeti).


Code:
> results = qpadm(prefix, left, right, target, allsnps = TRUE)
ℹ Reading metadata...
ℹ Computing block lengths for 1150639 SNPs...
ℹ Computing 18 f4-statistics for block 713 out of 713...
ℹ "allsnps = TRUE" uses different SNPs for each f4-statistic
  Number of SNPs used for each f4-statistic:
    pop1     pop2     pop3                  pop4      n
1  dosas   Etiuni Mbuti.DG        Iberomaurusian 657869
2  dosas   Etiuni Mbuti.DG                Iran_N 627599
3  dosas   Etiuni Mbuti.DG              Iraq_PPN 275421
4  dosas   Etiuni Mbuti.DG              Natufian 140826
5  dosas   Etiuni Mbuti.DG                ONG.SG 693364
6  dosas   Etiuni Mbuti.DG  Russia_AfontovaGora3 156721
7  dosas   Etiuni Mbuti.DG      Russia_MA1_HG.SG 489370
8  dosas   Etiuni Mbuti.DG Turkey_Epipaleolithic 521589
9  dosas   Etiuni Mbuti.DG                   WHG 467356
10 dosas Thracian Mbuti.DG        Iberomaurusian 641747
11 dosas Thracian Mbuti.DG                Iran_N 616376
12 dosas Thracian Mbuti.DG              Iraq_PPN 272303
13 dosas Thracian Mbuti.DG              Natufian 139943
14 dosas Thracian Mbuti.DG                ONG.SG 668717
15 dosas Thracian Mbuti.DG  Russia_AfontovaGora3 155919
16 dosas Thracian Mbuti.DG      Russia_MA1_HG.SG 474445
17 dosas Thracian Mbuti.DG Turkey_Epipaleolithic 512834
18 dosas Thracian Mbuti.DG                   WHG 464542
ℹ Computing admixture weights...
ℹ Computing standard errors...
ℹ Computing number of admixture waves...
> results$weights
# A tibble: 2 × 5
  target left     weight     se     z
  <chr>  <chr>     <dbl>  <dbl> <dbl>
1 dosas  Thracian  0.542 0.0764  7.10
2 dosas  Etiuni    0.458 0.0764  5.99
> results$popdrop
# A tibble: 3 × 13
  pat      wt   dof chisq        p f4rank Thracian Etiuni feasible best  dofdiff chisqdiff p_nested
  <chr> <dbl> <dbl> <dbl>    <dbl>  <dbl>    <dbl>  <dbl> <lgl>    <lgl>   <dbl>     <dbl>    <dbl>
1 00        0     8  41.7 1.56e- 6      1    0.542  0.458 TRUE     NA         NA      NA         NA
2 01        1     9 318.  4.12e-63      0    1     NA     TRUE     TRUE        0     -74.4        1
3 10        1     9 392.  6.08e-79      0   NA      1     TRUE     TRUE       NA      NA         NA

Qpadm gives me roughly the same percentages 54.2% (s.e. 7.64%) for Thracian and 45.8% (s.e. 7.64%) for Etiuni, but the p-value again is abysmally low at 0.00000156 indicating a fail.


I swap around some stufff and run my own model:

Code:
eading metadata...
ℹ Computing block lengths for 1150639 SNPs...
ℹ Computing 57 f4-statistics for block 713 out of 713...
ℹ "allsnps = TRUE" uses different SNPs for each f4-statistic
  Number of SNPs used for each f4-statistic:
.
.
.
ℹ Computing admixture weights...
ℹ Computing standard errors...
ℹ Computing number of admixture waves...

warning: solve(): system is singular (rcond: 2.0141e-17); attempting approx solution
> results$weights
# A tibble: 3 × 5
  target left                       weight     se     z
  <chr>  <chr>                       <dbl>  <dbl> <dbl>
1 dosas  Spain_Hellenistic_Emporion 0.390  0.0636  6.12
2 dosas  Armenia_LBA.SG             0.527  0.0739  7.13
3 dosas  AV2                        0.0836 0.0467  1.79
> results$popdrop
# A tibble: 7 × 14
  pat      wt   dof chisq        p f4rank Spain_Hellen…¹ Armen…²     AV2 feasi…³ best  dofdiff
  <chr> <dbl> <dbl> <dbl>    <dbl>  <dbl>          <dbl>   <dbl>   <dbl> <lgl>   <lgl>   <dbl>
1 000       0    17  27.3 5.39e- 2      2          0.390   0.527  0.0836 TRUE    NA         NA
It works.

39% Empuries2 + 52.7% Armenia_LBA + 8.36% Slav AV2. Tail is 5.39% and s.e. almost at 5%.

Before someone mentions the obvious, that this is a 3way model, nowhere on illustrative dna I get a result that is even remotely close to the above.


TLDR:

Those who put their blind faith on their g25 results they should keep in mind that those do not correspond necessarily to proper fstats modelling.
So how can G25 coordinates be used for accurate results?
 
Eupator can you test this three-way model for I15707 (it is a post-mdv Alb sample with extremely low Slavic). His biggest component is consistently picked up in modern Albs G25 and I am curious if it holds up in qpadm.

RSzpURr.png


Suggested model:
component 1: I18832 or I10379, these two profiles are very interchangeable.
component 2: I16253 or I14690, I16253 is preferred but I do not see it in the 1240k list
component 3: Art039
 
Then in which program should i use my raw dna data to get accurate results?
I don't want to sound sarcastic, but you should use qpAdm.
 
I don't want to sound sarcastic, but you should use qpAdm.
i have tried to use it but i had not understand how i am gonna use it.I could send to you my raw dna data to run them in qpAdm
 
Nah, tbh it is extremely complex and time consuming. It took me a month of trial and error to figure it out with AI, working on it around the clock. That was a while ago. Sorry buddy, but you are on your own. Nevertheless, I proved it is possible. However, once they update the Reich lab dataset, I may endeavor to do it again. If so, I'll take better notes.
 
Nah, tbh it is extremely complex and time consuming. It took me a month of trial and error to figure it out with AI. That was a while ago. Sorry buddy, but you are on your own. Nevertheless, I proved it is possible. However, once they update the Reich lab dataset, I may endeavor to do it again. If so, I'll take better notes and share them.
Oh i understand
 
i have tried to use it but i had not understand how i am gonna use it.I could send to you my raw dna data to run them
My advice is download Ubuntu to use Linux if you have windows.

Covert your 23andme data to plink format.

Covert the Reich Lab dataset to plink format.

Then you have to merge them, but only the unique SNPs from your data. You don't want to alter the Reich lab set. It needs to be a one sided merge that defers to the Reich lab.
 
Razib Khan had converted the Reich lab set to plink. Use that as a standard for how it should be done appropriately when asking the AI.
 
Chatgpt is lucky very good at Linux and powershell prompts.
 
After everything is merged you need to run it in Admixtools2, eupator has a guide for set up in the admixtools subforum.
 
I think Eupator has made a tutorial
 
I think Eupator has made a tutorial

Learn how to use it using my tutorial.

After you learn, I will convert your file and merge it for plink, because I am such a swell guy.
 
Learn how to use it using my tutorial.

After you learn, I will convert your file and merge it for plink, because I am such a swell guy.
Nice,thanks
 
This is the tutorial right?
 
Back
Top