Comparing Ancient Greek populations to modern Greeks and Italians

I would like to endeavor to replicate the southern arc model. The samples are listed in the supplemental information. After doing that a few times, one could hone their abilities to produce their own models.

It would be interesting from that point, to start making prompts on threads where people could run that model against different populations of their choice.

I was able to get very far with trying to convert my VCF file. I managed to convert them in PLINK. But I downloaded a 2014 precompiled version of eigensoft for windows. However if fatally crashed because it couldn't read the chromosome because the software is too old. I need to figure out how to run a newer version of eigensoft, so I could convert the files to eigenstrat format, and read it in admixtools.

If I could do that, I could upload any file to admixtools.
 
The math it provided was incorrect in the post?
Yeah. Sometimes it even has problems with addition, or other basic stuff. I had to cancel my subscription after GPT4 quality fell off past 3 months. Basically it became 3.5. Not sure what happened.
 
From all the ancient haplogroup samples I have seen of greece. Albanians tend to have more cousin lineages than modern greeks. And there is often a cousin lineage with an adriatic one or near afriatic like croatia or slovakia.
Ancient samples on Y full.
 
I would like to endeavor to replicate the southern arc model. The samples are listed in the supplemental information. After doing that a few times, one could hone their abilities to produce their own models.

It would be interesting from that point, to start making prompts on threads where people could run that model against different populations of their choice.

I was able to get very far with trying to convert my VCF file. I managed to convert them in PLINK. But I downloaded a 2014 precompiled version of eigensoft for windows. However if fatally crashed because it couldn't read the chromosome because the software is too old. I need to figure out how to run a newer version of eigensoft, so I could convert the files to eigenstrat format, and read it in admixtools.

If I could do that, I could upload any file to admixtools.

Nice, I was able to use Ubuantu for Lynux, and install the latest version of Eigensoft. Moreover, I was able to convert my VCF from my WGS30x from nebula in the eigenstrat format!

@Acrhetype, I hear what you say about chatGPT, but honestly, I never would have been able to do this conversion and installation in a matter of a couple days without it. It was able to generate the right prompts for all of the various installers and software and could read outputs and give me the right answers. Code interpreter is phenomenal, you can even upload zip files for it to examine.
 
@Palermo:

ITS2.HO M Ignore_Italian_South_1d.rel.HO
ITS4.HO F Italian_South.HO
ITS5.HO F Italian_South.HO
ITS7.HO M Italian_South.HO
BEL57.HO M Italian_South.HO

This is what the Reich Lab lists as Southern Italian in the ind. file

But one could edit them with a free software like Visual Studio Code to examine the samples individually. Basically qpADM combines the samples that all have the same label. So it is possible to rearrange them. I noticed many of them have very different (sometimes misleading/inaccurate) nomenclature for aDNA than is listed in the studies they originate from. For example the Central Italian Neolithic samples are listed as Italian_South_N, which is bizarre.

Thanks, Sorry I did not respond quicker. Internet is out at my home.
 
GPT is notably bad at math. It's like 9 orders of magnitude off, x10^-10, you are looking for x10^-1, or ^-2.
5x10^-2 is 0.05, and 5x10-10 is 0.00000000005.

I'm totally ignorant about this, so forgive me if my question seems pretty stupid. What is the p value and does it need to be less than 0.05 or just near 0.05?
 
I'm totally ignorant about this, so forgive me if my question seems pretty stupid. What is the p value and does it need to be less than 0.05 or just near 0.05?

In this case over .05
 
6ABNkO8.png


I did my best to replicate the modeling from Sarno et al. 2021. I believe I may have succeeded. P-value is good.

I copied the methodology from the study as best as I could when creating the components for the model, and the outgroups.

What I want to do is replicate studies and test different populations with them. I think it would be entertaining to run an academic model against my own WGS sample too.

ChatGPT:

The value "4.15e-2" is a scientific notation representation of the number \(0.0415\). In many statistical contexts, this is interpreted as a p-value.


A p-value is a measure used in hypothesis testing to help determine the significance of a result. It represents the probability of observing a test statistic as extreme as, or more extreme than, the statistic computed from the sample, given that the null hypothesis is true.


Typically, in many fields:
- A p-value less than \(0.05\) (or 5%) is considered statistically significant.
- A p-value less than \(0.01\) (or 1%) is considered highly statistically significant.
- A p-value greater than \(0.05\) is usually not considered statistically significant.


Given this general guideline, a p-value of \(0.0415\) or \(4.15e-2\) is considered statistically significant, but it's just below the common threshold of \(0.05\). It's important to interpret this p-value in the context of the specific study, the stakes of making a type I error (rejecting a true null hypothesis), and other relevant factors.

^^
Italian_South.HO

21.3% EMBA_Steppe
54.4% Anatolia_N
1.9% WHG
22.4% CHG_Iran_N

P-value: 0.0415
 
Last edited:
I don't know, frankly, if the East Med samples in Rome isn't sufficient, it is a hard sell for me, considering that's primarily what people point to as the source.
Also, I have never heard of Slavic in Southern Italy. I only remember some guy named sikeliot/azzurro who pushed hard for that. Moreover, Germanic in southern Italy would even be very difficult to demonstrate in the south. The Lombards for example did not leave much of an impact at all on the south.
Lombards where mostly stationed in Benevento

for over 400 years

https://en.wikipedia.org/wiki/Duchy_of_Benevento
 
Molise Croats (Croatian: Moliški Hrvati) or Molise Slavs (Italian: Slavo-molisani, Slavi del Molise) are a Croat and Slavic community in the Molise province of Campobasso of Italy, which constitutes the majority in the three villages of Acquaviva Collecroce (Kruč), San Felice del Molise (Štifilić) and Montemitro (Mundimitar).[2] There are about 1,000 active and 2,000 passive speakers of the Slavomolisano dialect. The community originated from Dalmatian refugees fleeing from the Ottoman conquests in the late 15th and 16th centuries.[2][3]

Where they Slavic or Dalmatians that learned the slavic tongue
 
6ABNkO8.png


I did my best to replicate the modeling from Sarno et al. 2021. I believe I may have succeeded. P-value is good.

I copied the methodology from the study as best as I could when creating the components for the model, and the outgroups.

What I want to do is replicate studies and test different populations with them. I think it would be entertaining to run an academic model against my own WGS sample too.

Where did you get WGS from and how much was it.

Keep in mind that good p value doesnt matter if your references are not strict enough. You can make models get good p value if you dont give qpAdm proper references/right pops to check if something else is needed. The way qpAdm works is that if something related to the references/right pops is needed to be added to the left then it will give you a bad p value.
 
Where did you get WGS from and how much was it.

Keep in mind that good p value doesnt matter if your references are not strict enough. You can make models get good p value if you dont give qpAdm proper references/right pops to check if something else is needed. The way qpAdm works is that if something related to the references/right pops is needed to be added to the left then it will give you a bad p value.

I got Nebula Genomics WGS30x done a couple years ago.

right now you can get it for only $99.
 
I got Nebula Genomics WGS30x done a couple years ago.

right now you can get it for only $99.

When I did it, the cost was $500 I believe.

Now that it is as cheap as 23andme, or Ancestry.

Honestly, with ChatGPT, the who experience of setting up Rstudio, Visual Studio Code, Admixtools, Plink, Ubuantu, Eigensoft, has been fairly straight forward. But still very rigorous, with a lot of set backs. Nevertheless, I am close to getting it to work. I just need to make sure the VCF is in the same format as the Reich Lab data set. I got all the way to the end, where I was going to merge the geno snp and ind files together, but if failed because they were in a different format from one another.

Nevertheless, I think I need to download a couple more software programs to make the necessary changes to the VCF.
 
6ABNkO8.png


I did my best to replicate the modeling from Sarno et al. 2021. I believe I may have succeeded. P-value is good.

Italian_South.HO

21.3% EMBA_Steppe
54.4% Anatolia_N
1.9% WHG
22.4% CHG_Iran_N

P-value: 0.0415

It would be interesting to replicate this model for ancient population as well, such as italics and Greeks
 

This thread has been viewed 133989 times.

Back
Top