Comparing Ancient Greek populations to modern Greeks and Italians

Idontknowwhatimdoing · Aug 2, 2023

Francesco said:
I see that those two samples from Halicarnassus are from the hellenistic age, but I assume that in the Iron age the inhabitants of the city might have been quite similar (maybe I'm wrong, though).

They seems to fall between bronze age Myceneans and Phrigians, although maybe the latter aren't the best proxies for Carian inhabitants of southwestern Anatolia.

The Gordion_Anc samples are not all Phrygians, only 2 from 650BC are Phrygians, the majority of them are from the Hellenistic period and have clear Greek admix.

The Halikarnassos samples seem to be a mix of mainland Greek + Levantine. On my old qpAdm models they showed more Natufian than Cypriots.

2.Anatolia_Center_Phrygian_650BC,0.112116,0.1502985,-0.0352605,-0.0734825,-0.008771,-0.0262155,0.00188,-0.0084225,-0.0191225,0.016401,0.0060895,0.004571,-0.0077305,0.000757,-0.011672,-0.007359,0.009192,-0.0003165,0.0050905,-0.000688,-0.001123,0.0052555,-0.003574,0.0025905,0.0037125

Francesco · Aug 9, 2023

Idontknowwhatimdoing said:
The Gordion_Anc samples are not all Phrygians, only 2 from 650BC are Phrygians, the majority of them are from the Hellenistic period and have clear Greek admix.

That's interesting and makes sense. I had a similar thought looking at their position on a PCA.
So, the "eastern-shifted" frigian samples should be more representative of early iron age central Anatolians.
I wonder if the western-shifted frigian samples, on the other hand, could be somewhat proximate, at least to some degree, to Ionian greeks of the archaic and classical age. I would say it's a possibility.

Idontknowwhatimdoing · Aug 10, 2023

Francesco said:
That's interesting and makes sense. I had a similar thought looking at their position on a PCA.
So, the "eastern-shifted" frigian samples should be more representative of early iron age central Anatolians.
I wonder if the western-shifted frigian samples, on the other hand, could be somewhat proximate, at least to some degree, to Ionian greeks of the archaic and classical age. I would say it's a possibility.

We have a lot of samples from 700-480BC Mugla West Anatolia. They are almost the same as the Mycenaeans from Attica but with a bit more Iran N. Scroll up i posted the models.

a quick qpAdm chart i made, many models have bad p values but proportions shouldnt be affected.

Code:

right = c('Ethiopia_4500BP.DG', 'Morocco_Iberomaurusian', 'Turkey_Epipaleolithic', 'Turkey_Boncuklu_N', 'Iran_Wezmeh_N.SG', 'Russia_Samara_HG+Russia_Karelia_HG', 'Switzerland_Bichon.SG', 'China_AmurRiver_LPaleolithic') + Jordan_PPNB for Morocco EN models, yes i tested if it affects proportions in models without morocco en, it does not, it just lowers SE for Morocco EN modelling.

If you want to check the Standard Errors:

Code:

P 0.0135
  target        left                      weight      se     z
  <chr>         <chr>                      <dbl>   <dbl> <dbl>
1 Greek_Cypriot Turkey_N                  0.500  0.0196  25.5 
2 Greek_Cypriot Israel_Natufian           0.0944 0.0210   4.49
3 Greek_Cypriot Iran_TepeAbdulHosein_N    0.242  0.0173  14.0 
4 Greek_Cypriot Russia_Samara_EBA_Yamnaya 0.141  0.0216   6.51
5 Greek_Cypriot Loschbour_WHG             0.0227 0.00915  2.48


P 0.0271
  target             left                      weight      se      z
  <chr>              <chr>                      <dbl>   <dbl>  <dbl>
1 Greek_Thessaloniki Turkey_N                  0.522  0.0136  38.5  
2 Greek_Thessaloniki Israel_Natufian           0.0121 0.0138   0.877
3 Greek_Thessaloniki Iran_TepeAbdulHosein_N    0.126  0.0127   9.91 
4 Greek_Thessaloniki Russia_Samara_EBA_Yamnaya 0.274  0.0172  16.0  
5 Greek_Thessaloniki Loschbour_WHG             0.0653 0.00691  9.44 


P 0.000601
  target       left                      weight      se     z
  <chr>        <chr>                      <dbl>   <dbl> <dbl>
1 Greek_Athens Turkey_N                  0.501  0.0139  36.1 
2 Greek_Athens Israel_Natufian           0.0458 0.0142   3.23
3 Greek_Athens Iran_TepeAbdulHosein_N    0.160  0.0125  12.8 
4 Greek_Athens Russia_Samara_EBA_Yamnaya 0.239  0.0156  15.3 
5 Greek_Athens Loschbour_WHG             0.0538 0.00651  8.26


PPNB in right
P 0.0424
  target        left                      weight      se     z
  <chr>         <chr>                      <dbl>   <dbl> <dbl>
1 Italian_South Turkey_N                  0.506  0.0307  16.5 
2 Italian_South Israel_Natufian           0.0431 0.0380   1.14
3 Italian_South Iran_TepeAbdulHosein_N    0.181  0.0189   9.60
4 Italian_South Russia_Samara_EBA_Yamnaya 0.202  0.0234   8.64
5 Italian_South Loschbour_WHG             0.0569 0.00998  5.70
6 Italian_South Morocco_EN                0.0112 0.00850  1.32


PPNB in right
P 0.000000354
  target   left                      weight      se     z
  <chr>    <chr>                      <dbl>   <dbl> <dbl>
1 Sicilian Turkey_N                  0.476  0.0350  13.6 
2 Sicilian Israel_Natufian           0.0680 0.0466   1.46
3 Sicilian Iran_TepeAbdulHosein_N    0.159  0.0145  10.9 
4 Sicilian Russia_Samara_EBA_Yamnaya 0.221  0.0177  12.5 
5 Sicilian Loschbour_WHG             0.0616 0.00744  8.28
6 Sicilian Morocco_EN                0.0142 0.00940  1.51


Natufian in right
P 5.21e- 3
  target    left                       weight      se     z
  <chr>     <chr>                       <dbl>   <dbl> <dbl>
1 Sardinian Turkey_N                  0.694   0.0102  68.2 
2 Sardinian Iran_TepeAbdulHosein_N    0.0769  0.0120   6.39
3 Sardinian Russia_Samara_EBA_Yamnaya 0.0927  0.0159   5.83
4 Sardinian Loschbour_WHG             0.127   0.00721 17.6 
5 Sardinian Morocco_EN                0.00972 0.00266  3.65


Natufian in right
P 0.00271
 target        left                        weight      se      z
  <chr>         <chr>                        <dbl>   <dbl>  <dbl>
1 Italian_North Turkey_N                  0.538    0.00903 59.6  
2 Italian_North Iran_TepeAbdulHosein_N    0.0997   0.0117   8.51 
3 Italian_North Russia_Samara_EBA_Yamnaya 0.273    0.0157  17.4  
4 Italian_North Loschbour_WHG             0.0884   0.00674 13.1  
5 Italian_North Morocco_EN                0.000938 0.00266  0.353


Natufian in right
P 7.86e- 10 
  target  left                      weight      se     z
  <chr>   <chr>                      <dbl>   <dbl> <dbl>
1 Spanish Turkey_N                  0.497  0.00813 61.1 
2 Spanish Iran_TepeAbdulHosein_N    0.0702 0.00933  7.53
3 Spanish Russia_Samara_EBA_Yamnaya 0.289  0.0130  22.2 
4 Spanish Loschbour_WHG             0.130  0.00588 22.0 
5 Spanish Morocco_EN                0.0139 0.00223  6.27


P 0.000192
  target             left                      weight      se     z
  <chr>              <chr>                      <dbl>   <dbl> <dbl>
1 Lebanese_Christian Turkey_N                  0.416  0.0197  21.1 
2 Lebanese_Christian Israel_Natufian           0.180  0.0201   8.92
3 Lebanese_Christian Iran_TepeAbdulHosein_N    0.303  0.0183  16.6 
4 Lebanese_Christian Russia_Samara_EBA_Yamnaya 0.0780 0.0223   3.50
5 Lebanese_Christian Loschbour_WHG             0.0233 0.00917  2.54


P 0.00415
  target    left                      weight      se     z
  <chr>     <chr>                      <dbl>   <dbl> <dbl>
1 Bulgarian Turkey_N                  0.474  0.0109  43.4 
2 Bulgarian Iran_TepeAbdulHosein_N    0.116  0.0137   8.46
3 Bulgarian Russia_Samara_EBA_Yamnaya 0.325  0.0191  17.0 
4 Bulgarian Loschbour_WHG             0.0851 0.00749 11.4 


congo outgroup
P 0.0251
  target      left                      weight      se     z
  <chr>       <chr>                      <dbl>   <dbl> <dbl>
1 Palestinian Turkey_N                  0.349  0.0185  18.9 
2 Palestinian Israel_Natufian           0.227  0.0200  11.3 
3 Palestinian Iran_TepeAbdulHosein_N    0.226  0.0161  14.1 
4 Palestinian Russia_Samara_EBA_Yamnaya 0.117  0.0198   5.94
5 Palestinian Loschbour_WHG             0.0189 0.00785  2.41
6 Palestinian Dinka.DG                  0.0621 0.00539 11.5 


P 0.0459
  target    left                      weight      se     z
  <chr>     <chr>                      <dbl>   <dbl> <dbl>
1 Norwegian Turkey_N                   0.355 0.0111   31.9
2 Norwegian Russia_Samara_EBA_Yamnaya  0.516 0.0142   36.3
3 Norwegian Loschbour_WHG              0.129 0.00803  16.1

ddddd
dd

Francesco · Aug 10, 2023

Idontknowwhatimdoing said:
We have a lot of samples from 700-480BC Mugla West Anatolia. They are almost the same as the Mycenaeans from Attica but with a bit more Iran N. Scroll up i posted the models.

Do you remember which study the Mugla samples are from? I would like to check in the supplementary materials if they are associated with a carian context or with a greek one

Idontknowwhatimdoing · Aug 10, 2023

Francesco said:
Do you remember which study the Mugla samples are from? I would like to check in the supplementary materials if they are associated with a carian context or with a greek one

I dont know, and i was asking people if they are Carians or Greeks archeologically and no one seems to know.

Here are their cords and ids

Code:

TUR_Aegean_Mugla_Degirmendere_Anc:I20224,0.122929,0.160454,-0.031678,-0.070414,0.000615,-0.018686,-0.00047,-0.011076,-0.001023,0.028976,0.001461,0.004796,-0.019772,-0.005367,-0.017372,-0.00053,0.011995,0.00076,0.003645,-0.01063,-0.005865,0.00643,-0.009367,-0.001205,-0.002994TUR_Aegean_Mugla_Degirmendere_Anc:I20225,0.117238,0.15436,-0.033941,-0.075905,0.001231,-0.026216,0.00705,-0.006,0.001227,0.031162,0.007957,0.006145,-0.021556,-0.007569,-0.013572,0.011933,0.031683,-0.003294,0.004399,-0.00075,-0.000873,0.00507,-0.007395,0.000843,0.005149
TUR_Aegean_Mugla_Degirmendere_Anc:I20226,0.10927,0.160454,-0.024513,-0.070737,0.00277,-0.027889,0,-0.002077,-0.008999,0.035172,0.001299,0.011989,-0.015907,0.007844,-0.016286,0.001326,0.018384,-0.004814,0.00993,-0.006753,-0.009109,0.003215,0.007025,-0.001566,-0.006227
TUR_Aegean_Mugla_Degirmendere_Anc:I20227,0.10927,0.152329,-0.031301,-0.069122,0.006463,-0.034025,0.00141,-0.004615,-0.011044,0.028611,-0.00065,0.004946,-0.009068,-0.003303,-0.018865,0.009281,0.026077,-0.005068,0.008045,-0.011881,-0.01148,0.000618,0.003204,0.003374,0.002155
TUR_Aegean_Mugla_Degirmendere_Anc:I20228,0.108132,0.153345,-0.040729,-0.063954,0.010463,-0.022032,0.000705,-0.002769,-0.002863,0.025513,-0.002598,0.002698,-0.015312,-0.000688,-0.014251,0.003315,0.015646,0.006208,0.014581,-0.002751,-0.006489,0.003586,-0.000123,0.002771,0.002515
TUR_Aegean_Mugla_Degirmendere_Anc:I20229,0.111547,0.157407,-0.029793,-0.069445,0.000615,-0.02008,-0.000705,-0.013153,-0.006954,0.037358,-0.000487,0.008692,-0.016353,0.007019,-0.023344,-0.005967,0.011604,0.007855,0.002765,-0.003502,-0.007736,0.000618,0.005053,-0.001566,-0.000239
TUR_Aegean_Mugla_Degirmendere_Anc:I20230,0.103579,0.158423,-0.027907,-0.062985,0.011387,-0.024263,0.003055,-0.008307,-0.010022,0.034443,0.002598,0.014237,-0.017245,-0.007569,-0.020087,0.004906,0.032596,-0.00228,0.005531,-0.001376,-0.003993,0.001855,-0.00037,-0.000723,0.00491
TUR_Aegean_Mugla_Degirmendere_Anc:I20231,0.1161,0.161469,-0.038843,-0.075905,0.003077,-0.025379,0.00047,-0.006923,-0.010226,0.029887,0.001137,0.007044,-0.016204,0.00289,-0.019815,0.001458,0.01004,0.005701,0.010307,-0.002751,-0.00574,0.004822,-0.003451,-0.004217,-0.012693
TUR_Aegean_Mugla_Degirmendere_Anc:I20232,0.112685,0.165531,-0.031301,-0.076228,0.010463,-0.033746,0.00799,-0.004846,0.000205,0.034625,0.007795,0.007044,-0.01888,-0.004266,-0.016829,0.002254,0.025555,0.00266,0.007542,0.001126,-0.00262,-0.000866,0,0.004338,-0.001916
TUR_Aegean_Mugla_Degirmendere_Anc:I20233,0.112685,0.156392,-0.027153,-0.0646,0.003385,-0.023427,0.000705,-0.009461,0.003886,0.033714,0.00406,0.006894,-0.013677,-0.007019,-0.023072,0.008353,0.027902,-0.004307,-0.001634,-0.004752,-0.005116,0.001113,0.005053,0.003133,-0.001916
TUR_Aegean_Mugla_Degirmendere_Anc:I20257,0.113823,0.160454,-0.024513,-0.072998,0.011079,-0.027889,-0.00658,-0.002538,-0.007158,0.035718,0.000325,0.008542,-0.015609,-0.003578,-0.016965,0.001193,0.017471,0.003674,0.006536,-0.002126,-0.011729,0.005564,-0.003081,-0.00253,0.000599
TUR_Aegean_Mugla_Degirmendere_Anc:I20258,0.10927,0.161469,-0.031678,-0.070091,0.00277,-0.027052,0.003995,-0.007846,-0.009817,0.036265,0.007795,0.011839,-0.016055,0.003165,-0.009093,-0.005436,0.013299,0.004054,0.003645,-0.001251,-0.004617,0.01014,-0.006409,0.004699,-0.001197

Jovialis · Aug 10, 2023

Idontknowwhatimdoing said:

We have a lot of samples from 700-480BC Mugla West Anatolia. They are almost the same as the Mycenaeans from Attica but with a bit more Iran N. Scroll up i posted the models.
a quick qpAdm chart i made, many models have bad p values but proportions shouldnt be affected.

Code:

right = c('Ethiopia_4500BP.DG', 'Morocco_Iberomaurusian', 'Turkey_Epipaleolithic', 'Turkey_Boncuklu_N', 'Iran_Wezmeh_N.SG', 'Russia_Samara_HG+Russia_Karelia_HG', 'Switzerland_Bichon.SG', 'China_AmurRiver_LPaleolithic') + Jordan_PPNB for Morocco EN models, yes i tested if it affects proportions in models without morocco en, it does not, it just lowers SE for Morocco EN modelling.

If you want to check the Standard Errors:

Code:

P 0.0135
target        left                      weight      se     z
<chr>         <chr>                      <dbl>   <dbl> <dbl>
1 Greek_Cypriot Turkey_N                  0.500  0.0196  25.5 
2 Greek_Cypriot Israel_Natufian           0.0944 0.0210   4.49
3 Greek_Cypriot Iran_TepeAbdulHosein_N    0.242  0.0173  14.0 
4 Greek_Cypriot Russia_Samara_EBA_Yamnaya 0.141  0.0216   6.51
5 Greek_Cypriot Loschbour_WHG             0.0227 0.00915  2.48
P 0.0271
target             left                      weight      se      z
<chr>              <chr>                      <dbl>   <dbl>  <dbl>
1 Greek_Thessaloniki Turkey_N                  0.522  0.0136  38.5  
2 Greek_Thessaloniki Israel_Natufian           0.0121 0.0138   0.877
3 Greek_Thessaloniki Iran_TepeAbdulHosein_N    0.126  0.0127   9.91 
4 Greek_Thessaloniki Russia_Samara_EBA_Yamnaya 0.274  0.0172  16.0  
5 Greek_Thessaloniki Loschbour_WHG             0.0653 0.00691  9.44 
P 0.000601
target       left                      weight      se     z
<chr>        <chr>                      <dbl>   <dbl> <dbl>
1 Greek_Athens Turkey_N                  0.501  0.0139  36.1 
2 Greek_Athens Israel_Natufian           0.0458 0.0142   3.23
3 Greek_Athens Iran_TepeAbdulHosein_N    0.160  0.0125  12.8 
4 Greek_Athens Russia_Samara_EBA_Yamnaya 0.239  0.0156  15.3 
5 Greek_Athens Loschbour_WHG             0.0538 0.00651  8.26
PPNB in right
P 0.0424
target        left                      weight      se     z
<chr>         <chr>                      <dbl>   <dbl> <dbl>
1 Italian_South Turkey_N                  0.506  0.0307  16.5 
2 Italian_South Israel_Natufian           0.0431 0.0380   1.14
3 Italian_South Iran_TepeAbdulHosein_N    0.181  0.0189   9.60
4 Italian_South Russia_Samara_EBA_Yamnaya 0.202  0.0234   8.64
5 Italian_South Loschbour_WHG             0.0569 0.00998  5.70
6 Italian_South Morocco_EN                0.0112 0.00850  1.32
PPNB in right
P 0.000000354
target   left                      weight      se     z
<chr>    <chr>                      <dbl>   <dbl> <dbl>
1 Sicilian Turkey_N                  0.476  0.0350  13.6 
2 Sicilian Israel_Natufian           0.0680 0.0466   1.46
3 Sicilian Iran_TepeAbdulHosein_N    0.159  0.0145  10.9 
4 Sicilian Russia_Samara_EBA_Yamnaya 0.221  0.0177  12.5 
5 Sicilian Loschbour_WHG             0.0616 0.00744  8.28
6 Sicilian Morocco_EN                0.0142 0.00940  1.51
Natufian in right
P 5.21e- 3
target    left                       weight      se     z
<chr>     <chr>                       <dbl>   <dbl> <dbl>
1 Sardinian Turkey_N                  0.694   0.0102  68.2 
2 Sardinian Iran_TepeAbdulHosein_N    0.0769  0.0120   6.39
3 Sardinian Russia_Samara_EBA_Yamnaya 0.0927  0.0159   5.83
4 Sardinian Loschbour_WHG             0.127   0.00721 17.6 
5 Sardinian Morocco_EN                0.00972 0.00266  3.65
Natufian in right
P 0.00271
target        left                        weight      se      z
<chr>         <chr>                        <dbl>   <dbl>  <dbl>
1 Italian_North Turkey_N                  0.538    0.00903 59.6  
2 Italian_North Iran_TepeAbdulHosein_N    0.0997   0.0117   8.51 
3 Italian_North Russia_Samara_EBA_Yamnaya 0.273    0.0157  17.4  
4 Italian_North Loschbour_WHG             0.0884   0.00674 13.1  
5 Italian_North Morocco_EN                0.000938 0.00266  0.353
Natufian in right
P 7.86e- 10 
target  left                      weight      se     z
<chr>   <chr>                      <dbl>   <dbl> <dbl>
1 Spanish Turkey_N                  0.497  0.00813 61.1 
2 Spanish Iran_TepeAbdulHosein_N    0.0702 0.00933  7.53
3 Spanish Russia_Samara_EBA_Yamnaya 0.289  0.0130  22.2 
4 Spanish Loschbour_WHG             0.130  0.00588 22.0 
5 Spanish Morocco_EN                0.0139 0.00223  6.27
P 0.000192
target             left                      weight      se     z
<chr>              <chr>                      <dbl>   <dbl> <dbl>
1 Lebanese_Christian Turkey_N                  0.416  0.0197  21.1 
2 Lebanese_Christian Israel_Natufian           0.180  0.0201   8.92
3 Lebanese_Christian Iran_TepeAbdulHosein_N    0.303  0.0183  16.6 
4 Lebanese_Christian Russia_Samara_EBA_Yamnaya 0.0780 0.0223   3.50
5 Lebanese_Christian Loschbour_WHG             0.0233 0.00917  2.54
P 0.00415
target    left                      weight      se     z
<chr>     <chr>                      <dbl>   <dbl> <dbl>
1 Bulgarian Turkey_N                  0.474  0.0109  43.4 
2 Bulgarian Iran_TepeAbdulHosein_N    0.116  0.0137   8.46
3 Bulgarian Russia_Samara_EBA_Yamnaya 0.325  0.0191  17.0 
4 Bulgarian Loschbour_WHG             0.0851 0.00749 11.4 
congo outgroup
P 0.0251
target      left                      weight      se     z
<chr>       <chr>                      <dbl>   <dbl> <dbl>
1 Palestinian Turkey_N                  0.349  0.0185  18.9 
2 Palestinian Israel_Natufian           0.227  0.0200  11.3 
3 Palestinian Iran_TepeAbdulHosein_N    0.226  0.0161  14.1 
4 Palestinian Russia_Samara_EBA_Yamnaya 0.117  0.0198   5.94
5 Palestinian Loschbour_WHG             0.0189 0.00785  2.41
6 Palestinian Dinka.DG                  0.0621 0.00539 11.5 
P 0.0459
target    left                      weight      se     z
<chr>     <chr>                      <dbl>   <dbl> <dbl>
1 Norwegian Turkey_N                   0.355 0.0111   31.9
2 Norwegian Russia_Samara_EBA_Yamnaya  0.516 0.0142   36.3
3 Norwegian Loschbour_WHG              0.129 0.00803  16.1

ddddd
dd

These values look reasonably consistent with academic results. Good job!

Jovialis · Aug 10, 2023

I've downloaded R and R studio, I "think" I installed Admixtools2 properly with the prompt. I also physically downloaded the Admixtools zip from github to my desktop. When I start up the Rstudio again, I run the Admixtools RMD file in vignettes in that folder.

I also downloaded the tar files from the Reich lab... now I'm stuck. How does one prompt R to replicate the model you made? What kind of file structure should there be?

Jovialis · Aug 10, 2023

Regarding P-values and goodness of fit, I referred to ChatGPT, and this is what it gave me.

Thus, a P-value of 0.05 is generally considered a good fit?

Me:

what are considered to be coherent good fits for P-values in modeling with qpadm?

ChatGPT:

The qpAdm method is a statistical tool used in population genetics to model the ancestry of a target population as a mixture of two or more source populations. The method provides estimates of the proportion of ancestry derived from each source population and tests the fit of the proposed model using a P-value.

In the context of qpAdm and similar statistical methods, the P-value is a measure of the goodness of fit of the model. Specifically:

High P-values: A high P-value (close to 1) suggests that the observed data is consistent with the proposed model, indicating a good fit. In other words, the data does not provide evidence against the proposed admixture model.

Low P-values: A low P-value (close to 0) suggests that the observed data is not consistent with the proposed model, indicating a poor fit. This means that the data provides evidence against the proposed admixture model.

In general, a commonly used threshold for significance in many fields is a P-value of 0.05. However, the specific threshold for considering a P-value to be a "good fit" in the context of qpAdm may vary based on the study's design, the specific hypotheses being tested, and the conventions of the field.

If the P-value is greater than the chosen significance level (e.g., 0.05), the model is considered a coherent good fit.
If the P-value is less than or equal to the significance level, the model is not considered a good fit, and the hypothesis of the proposed admixture model is rejected.
It's essential to interpret P-values in the context of the specific study and to consider other factors, such as the quality of the data, the assumptions of the model, and the biological plausibility of the results.

Ralphie Boy · Aug 10, 2023

Francesco said:
Do you remember which study the Mugla samples are from? I would like to check in the supplementary materials if they are associated with a carian context or with a greek one

These samples are from the Southern Arc supplementals. Doing a quick search shows nothing about context for the Archaic samples other than the interior of Caria (page 292). The graves section for these samples is on pages 67-68.

Idontknowwhatimdoing · Aug 11, 2023

Jovialis said:
Regarding P-values and goodness of fit, I referred to ChatGPT, and this is what it gave me.

Thus, a P-value of 0.05 is generally considered a good fit?

Me:

what are considered to be coherent good fits for P-values in modeling with qpadm?

ChatGPT:

The qpAdm method is a statistical tool used in population genetics to model the ancestry of a target population as a mixture of two or more source populations. The method provides estimates of the proportion of ancestry derived from each source population and tests the fit of the proposed model using a P-value.

In the context of qpAdm and similar statistical methods, the P-value is a measure of the goodness of fit of the model. Specifically:

High P-values: A high P-value (close to 1) suggests that the observed data is consistent with the proposed model, indicating a good fit. In other words, the data does not provide evidence against the proposed admixture model.

Low P-values: A low P-value (close to 0) suggests that the observed data is not consistent with the proposed model, indicating a poor fit. This means that the data provides evidence against the proposed admixture model.

In general, a commonly used threshold for significance in many fields is a P-value of 0.05. However, the specific threshold for considering a P-value to be a "good fit" in the context of qpAdm may vary based on the study's design, the specific hypotheses being tested, and the conventions of the field.

If the P-value is greater than the chosen significance level (e.g., 0.05), the model is considered a coherent good fit.
If the P-value is less than or equal to the significance level, the model is not considered a good fit, and the hypothesis of the proposed admixture model is rejected.
It's essential to interpret P-values in the context of the specific study and to consider other factors, such as the quality of the data, the assumptions of the model, and the biological plausibility of the results.

Ive heard they lowered the p value threshold to 0.01, i dont know which new study says this. But i prefer to stick with 0,05 while 0.01 being acceptable in some cases.

Idontknowwhatimdoing · Aug 11, 2023

Jovialis said:
I've downloaded R and R studio, I "think" I installed Admixtools2 properly with the prompt. I also physically downloaded the Admixtools zip from github to my desktop. When I start up the Rstudio again, I run the Admixtools RMD file in vignettes in that folder.
I also downloaded the tar files from the Reich lab... now I'm stuck. How does one prompt R to replicate the model you made? What kind of file structure should there be?

These are the commands

Setting up the pathes

Code:

prefix = 'D:/rich dadaset/data/few data/v54.1_1240K_public'my_f2_dir = 'D:/rich dadaset/data/my_f2_dir'
outprefix = 'D:/rich dadaset/data/outprefix'
library(admixtools)
library(tidyverse)

For running a model.

Code:

right = c('Ethiopia_4500BP.DG', 'Morocco_Iberomaurusian', 'Turkey_Epipaleolithic', 'Turkey_Boncuklu_N', 'Iran_Wezmeh_N.SG', 'Russia_Samara_HG+Russia_Karelia_HG', 'Switzerland_Bichon.SG', 'China_AmurRiver_LPaleolithic')


left = c('Turkey_N', 'Israel_Natufian', 'Iran_TepeAbdulHosein_N.SG' , 'Russia_Samara_EBA_Yamnaya', 'Luxembourg_Loschbour.DG')


target = c('Greece_BA_Mycenaean_Attica')


results = qpadm(prefix, left, right, target, allsnps = TRUE)
results$weights
results$popdrop

I combined 'Russia_Samara_HG+Russia_Karelia_HG

allsnps must be always true or you get bad models.

use the 1240k dataset for modelling ancients because the HO is half coverage. Use the HO only for the extra modern samples.

Jovialis · Aug 11, 2023

^^Thanks for the advice, however it seems that my issue is with installing R 4.2.0

When I run the installer, it says it is complete, and I even see it as installed on the "Add/remove programs" section of my computer's control panel.

But there's no shortcut that generates after the installation, nor can I find the executable.

I had tried installing a newer version of R, and it worked fine, but it will not work for ADMIXTOOLS2.

My OS is Windows 11.

I thought it might be my anti-virus, so I turned it off and installed Rtools 4.2.0, but no luck with that either.

Jovialis · Aug 11, 2023

I suppose I could try to install lynx on my PC, but I have never used that OS in my life.

Also, just trying to get it to work for my PC will be another rabbit hole. I think I should just try to stick to getting this method to work. But it is really boggling my mind as to why it isn't considering I followed the steps, even in a video.

Jovialis · Aug 12, 2023

So I've been playing around in ADMIXTOOLS2, and with the help of ChatGPT, I can better understand the results:

Here is running R437 against Italian_South.SG

I had to do some relabeling, man, I am suprised at how poor some of the labeling is! I should really give people here a break with using nomenclature not used in the studies. Because it seems like even the Reich lab's dataset is a little slapdash.

Code:

> results$weights
# A tibble: 1 × 5
  target           left                              weight       se       z
  <chr>            <chr>                              <dbl>    <dbl>   <dbl>
1 Italian_South.HO Italy_IA_Republic_Prenestini_o.SG      1 1.14e-13 8.79e12
> results$popdrop
# A tibble: 1 × 12
  pat      wt   dof chisq     p f4rank Italy_IA_Republic_Prenestini_o.SG feasible best  dofdiff chisqdiff p_nested
  <chr> <dbl> <dbl> <dbl> <dbl>  <dbl>                             <dbl> <lgl>    <lgl>   <dbl>     <dbl>    <dbl>
1 0         0    19  25.4 0.149      0                                 1 TRUE     NA         NA        NA       NA
> 
>

ChatGPT:

Let's interpret the provided results:

### 1. **results$weights**:

This section provides information about the estimated genetic contributions (or admixture proportions) of different source populations to the target population.

- **Italian_South.HO** has an estimated admixture proportion of **1** from the **Italy_IA_Republic_Prenestini_o.SG** population. This suggests that the genetic makeup of the `Italian_South.HO` population can be entirely attributed to the `Italy_IA_Republic_Prenestini_o.SG` population.

- The **se** (standard error) value is extremely small (`1.14e-13`), which indicates a high level of confidence in this estimation.

- The **z** value (`8.79e12`) is a measure of how many standard deviations away the observed value is from the expected value. A high absolute z-value indicates that the observed value is significantly different from the expected value. In this context, it suggests a very strong signal for the observed admixture proportion.

### 2. **results$popdrop**:

This section provides insights into the model's fit when considering or excluding specific source populations. It helps determine the significance of each source population in the admixture model.

- The **pat** column indicates which source populations are considered (`1` means included, `0` means excluded). Here, `0` means the source population is excluded.

- The **wt** column shows the weight of the source population in the admixture, which is `0` in this case.

- The **chisq** and **p** columns provide a statistical measure of how well the model fits the data. A p-value of `0.149` suggests that the model fits the data reasonably well, but not perfectly. Typically, a p-value below 0.05 would indicate a statistically significant result, but in this case, the p-value is slightly above that threshold.

- The **feasible** column indicates if the model is feasible. If `TRUE`, it means the model can be used to explain the genetic makeup of the target population.

In summary:

- The `Italian_South.HO` population's genetic makeup can be entirely attributed to the `Italy_IA_Republic_Prenestini_o.SG` population.

- The model fits the data reasonably well, as indicated by the p-value of `0.149`.

- The extremely small standard error suggests a high level of confidence in the estimated admixture proportion.

Jovialis · Aug 12, 2023

Here is running R850 against Italian_South.SG:

Code:

> results$weights
# A tibble: 1 × 5
  target           left                          weight       se       z
  <chr>            <chr>                          <dbl>    <dbl>   <dbl>
1 Italian_South.HO Italy_IA_Republic_Latini_o.SG      1 1.14e-13 8.79e12
> results$popdrop
# A tibble: 1 × 12
  pat      wt   dof chisq     p f4rank Italy_IA_Republic_Latini_o.SG feasible best  dofdiff chisqdiff p_nested
  <chr> <dbl> <dbl> <dbl> <dbl>  <dbl>                         <dbl> <lgl>    <lgl>   <dbl>     <dbl>    <dbl>
1 0         0    19  13.1 0.835      0                             1 TRUE     NA         NA        NA       NA

ChatGPT:

Let's break down the provided results:

### 1. **results$weights**:

This section provides information about the estimated genetic contributions (or admixture proportions) of different source populations to the target population.

- **Italian_South.HO** has an estimated admixture proportion of **1** from the **Italy_IA_Republic_Latini_o.SG** population. This suggests that the genetic makeup of the `Italian_South.HO` population can be entirely attributed to the `Italy_IA_Republic_Latini_o.SG` population.

- The **se** (standard error) value is extremely small (`1.14e-13`), which indicates a high level of confidence in this estimation.

- The **z** value (`8.79e12`) is a measure of how many standard deviations away the observed value is from the expected value. A high absolute z-value indicates that the observed value is significantly different from the expected value. In this context, it suggests a very strong signal for the observed admixture proportion.

### 2. **results$popdrop**:

This section provides insights into the model's fit when considering or excluding specific source populations. It helps determine the significance of each source population in the admixture model.

- The **pat** column indicates which source populations are considered (`1` means included, `0` means excluded). Here, `0` means the source population is excluded.

- The **wt** column shows the weight of the source population in the admixture, which is `0` in this case.

- The **chisq** and **p** columns provide a statistical measure of how well the model fits the data. A p-value of `0.835` suggests that the model fits the data well, as higher p-values indicate a good fit.

- The **feasible** column indicates if the model is feasible. If `TRUE`, it means the model can be used to explain the genetic makeup of the target population.

In summary:

- The `Italian_South.HO` population's genetic makeup can be entirely attributed to the `Italy_IA_Republic_Latini_o.SG` population.

- The model fits the data well, as indicated by the high p-value.

- The extremely small standard error suggests a high level of confidence in the estimated admixture proportion.

Jovialis · Aug 12, 2023

Here is what happens when you combine both in the model:

Code:

> results$weights
# A tibble: 2 × 5
  target           left                              weight    se      z
  <chr>            <chr>                              <dbl> <dbl>  <dbl>
1 Italian_South.HO Italy_IA_Republic_Prenestini_o.SG  -1.03  2.63 -0.391
2 Italian_South.HO Italy_IA_Republic_Latini_o.SG       2.03  2.63  0.772
> results$popdrop
# A tibble: 3 × 13
  pat      wt   dof chisq     p f4rank Italy_IA_Republic_Prenes…¹ Italy_IA_Republic_La…² feasible best  dofdiff chisqdiff
  <chr> <dbl> <dbl> <dbl> <dbl>  <dbl>                      <dbl>                  <dbl> <lgl>    <lgl>   <dbl>     <dbl>
1 00        0    18  10.5 0.916      1                      -1.03                   2.03 FALSE    NA         NA      NA  
2 01        1    19  26.4 0.120      0                       1                     NA    TRUE     TRUE        0      12.8
3 10        1    19  13.5 0.810      0                      NA                      1    TRUE     TRUE       NA      NA  
# ℹ abbreviated names: ¹Italy_IA_Republic_Prenestini_o.SG, ²Italy_IA_Republic_Latini_o.SG
# ℹ 1 more variable: p_nested <dbl>

ChatGPT:

The results provide insights into the genetic contributions of different source populations to the target population `Italian_South.HO`. Let's break down the results:

### 1. **results$weights**:

This section provides information about the estimated genetic contributions (or admixture proportions) of different source populations to the target population.

- **Italian_South.HO** has an estimated admixture proportion of **-1.03** from the **Italy_IA_Republic_Prenestini_o.SG** population. The negative value suggests a more complex relationship between these populations than a straightforward admixture event. It could be due to over-fitting, or it might indicate that while there was gene flow from one population, there was simultaneous gene flow in the opposite direction.

- **Italian_South.HO** has an estimated admixture proportion of **2.03** from the **Italy_IA_Republic_Latini_o.SG** population. This suggests that a significant portion of the genetic makeup of the `Italian_South.HO` population can be attributed to the `Italy_IA_Republic_Latini_o.SG` population.

### 2. **results$popdrop**:

This section provides insights into the model's fit when considering or excluding specific source populations. It helps determine the significance of each source population in the admixture model.

- The **pat** column indicates which source populations are considered (`1` means included, `0` means excluded). For example, `01` means the first source population is excluded, and the second one is included.

- The **wt** column shows the weight of the source population in the admixture.

- The **chisq** and **p** columns provide a statistical measure of how well the model fits the data. A higher p-value suggests a worse fit. For instance, the model with a pattern `00` has a p-value of `0.916`, indicating it's not a very good fit.

- The **feasible** column indicates if the model is feasible. If `TRUE`, it means the model can be used to explain the genetic makeup of the target population.

- The **best** column indicates if this is the best-fitting model among the ones considered. If `TRUE`, it suggests that this model provides the best explanation for the genetic makeup of the target population based on the source populations provided.

In summary:

- The `Italian_South.HO` population has a complex relationship with the `Italy_IA_Republic_Prenestini_o.SG` population, as indicated by the negative weight.

- The `Italian_South.HO` population has significant genetic contributions from the `Italy_IA_Republic_Latini_o.SG` population.

- The `popdrop` results suggest that the model considering only the `Italy_IA_Republic_Latini_o.SG` population (and excluding the `Italy_IA_Republic_Prenestini_o.SG` population) provides a better fit to the data than the full model.

Jovialis · Aug 12, 2023

I sorted out the C6 samples from the Imperial era, and ran it against ITS7, which is the sample I'm closest too, I recall, from Dodecad K12b. It can be sourced completely from the Imperial era C6 Romans.

Code:

> results$weights
# A tibble: 1 × 5
  target                left                 weight       se       z
  <chr>                 <chr>                 <dbl>    <dbl>   <dbl>
1 Italian_South_ITS7.HO C6_Italy_Imperial.SG      1 1.14e-13 8.79e12
> results$popdrop
# A tibble: 1 × 12
  pat      wt   dof chisq     p f4rank C6_Italy_Imperial.SG feasible best  dofdiff chisqdiff p_nested
  <chr> <dbl> <dbl> <dbl> <dbl>  <dbl>                <dbl> <lgl>    <lgl>   <dbl>     <dbl>    <dbl>
1 0         0    20  14.6 0.801      0                    1 TRUE     NA         NA        NA       NA
>

ChatGPT:

The results you've provided come from an application of `qpadm`, which is a method used to estimate the proportions of ancestry from potential source populations in a target population based on genetic data. This method is frequently used in the field of archaeogenetics to model how present-day or ancient populations are admixed. Let's break down the results step by step:

1. **Computing Block Lengths for SNPs**:
- The analysis is based on 593,124 SNPs (Single Nucleotide Polymorphisms). These are locations in the DNA where individuals may have different bases (like A, T, C, G). Different SNP variants can be associated with ancestry from different populations.
- The analysis computed f4-statistics for 711 blocks. The f4-statistics are a measure of shared genetic drift between populations, which can indicate gene flow or common ancestry.

2. **Number of SNPs used for each f4-statistic**:
- This section lists the number of SNPs used to compute the f4-statistic for each combination of four populations (pop1, pop2, pop3, pop4). In this analysis, "Italian_South_ITS7.HO" and "C6_Italy_Imperial.SG" are constants for pop1 and pop2, respectively. "Mbuti.DG" is used as an outgroup (pop3) to provide a point of comparison. The pop4 varies across different ancient and present-day populations.

3. **Admixture weights**:
- The target population "Italian_South_ITS7.HO" receives an admixture weight of 1 from the source population "C6_Italy_Imperial.SG". This suggests that the target population is entirely derived from the source population in this model. The standard error (se) is extremely low, indicating a high confidence in this result. The 'z' value is a z-score, and given its extremely high value, it implies that the weight is highly significant.

4. **Number of Admixture Waves**:
- This section is assessing the fit of the admixture model. In this case, the `pat` value of 0 indicates no additional source populations beyond "C6_Italy_Imperial.SG" improve the fit of the model to the data. The chi-square value (chisq) of 14.6 with a degree of freedom (dof) of 20 gives a p-value of 0.801. This high p-value indicates that the model fits the data well. If the p-value were low, it would suggest that the model does not adequately explain the genetic variation in the target population, and additional source populations might be needed.

In summary, from an archaeogenetic perspective, these results suggest that the "Italian_South_ITS7.HO" population is entirely derived from the "C6_Italy_Imperial.SG" population, and there's no evidence of significant gene flow from any of the other populations listed in the f4-statistics section. This could indicate a direct lineage or very strong gene flow from the "C6_Italy_Imperial.SG" population to the "Italian_South_ITS7.HO" population without significant inputs from other populations.

Palermo Trapani · Aug 12, 2023

RE: Posts 891 (IdontknowwhatIamdoing) and Jovialis posts 892-894.

IdontknowwhatIamdoing: I appreciate your recent post and your contributions to the forum. Given that folks of Greek ancestry, and Albanians as well, and those of Italian-South/Sicily all the way to parts of Central Italy are closely related and overlap as we have seen in numerous genetic studies, your skills combined with Jovialis work can really help all of us better understand the ethnogenesis of all those peoples.

Jovialis: Regarding your posts, is that Italian-South a Macro measure of Campania, Calabria, Basilicata, Puglia and Sicily collectively or just mainland South. If it is just mainland South, what are the models when Sicily is used if you don't mind me asking.

Cheers, PT

Idontknowwhatimdoing · Aug 13, 2023

Jovialis said:

I sorted out the C6 samples from the Imperial era, and ran it against ITS7, which is the sample I'm closest too, I recall, from Dodecad K12b. It can be sourced completely from the Imperial era C6 Romans.

Code:

> results$weights
# A tibble: 1 × 5
  target                left                 weight       se       z
  <chr>                 <chr>                 <dbl>    <dbl>   <dbl>
1 Italian_South_ITS7.HO C6_Italy_Imperial.SG      1 1.14e-13 8.79e12
> results$popdrop
# A tibble: 1 × 12
  pat      wt   dof chisq     p f4rank C6_Italy_Imperial.SG feasible best  dofdiff chisqdiff p_nested
  <chr> <dbl> <dbl> <dbl> <dbl>  <dbl>                <dbl> <lgl>    <lgl>   <dbl>     <dbl>    <dbl>
1 0         0    20  14.6 0.801      0                    1 TRUE     NA         NA        NA       NA
>

ChatGPT:

Can you send me the IDs of the C6 samples?

Jovialis · Aug 13, 2023

Idontknowwhatimdoing said:
Can you send me the IDs of the C6 samples?

Certainly,

I relabeled them according to this:

https://www.eupedia.com/forum/threa...lators)/page14?p=625783&viewfull=1#post625783

Comparing Ancient Greek populations to modern Greeks and Italians

Fledgling

Banned

Fledgling

Banned

Fledgling

Advisor

Advisor

Advisor

Regular Member

Fledgling

Fledgling

Advisor

Advisor

Advisor

Advisor

Advisor

Advisor

Regular Member

Fledgling

Advisor