Comparing Ancient Greek populations to modern Greeks and Italians

Jovialis · Aug 14, 2023

I would like to endeavor to replicate the southern arc model. The samples are listed in the supplemental information. After doing that a few times, one could hone their abilities to produce their own models.

It would be interesting from that point, to start making prompts on threads where people could run that model against different populations of their choice.

I was able to get very far with trying to convert my VCF file. I managed to convert them in PLINK. But I downloaded a 2014 precompiled version of eigensoft for windows. However if fatally crashed because it couldn't read the chromosome because the software is too old. I need to figure out how to run a newer version of eigensoft, so I could convert the files to eigenstrat format, and read it in admixtools.

If I could do that, I could upload any file to admixtools.

Archetype0ne · Aug 14, 2023

Jovialis said:
The math it provided was incorrect in the post?

Yeah. Sometimes it even has problems with addition, or other basic stuff. I had to cancel my subscription after GPT4 quality fell off past 3 months. Basically it became 3.5. Not sure what happened.

BenRandy1 · Aug 14, 2023

From all the ancient haplogroup samples I have seen of greece. Albanians tend to have more cousin lineages than modern greeks. And there is often a cousin lineage with an adriatic one or near afriatic like croatia or slovakia.
Ancient samples on Y full.

Jovialis · Aug 14, 2023

Jovialis said:
I would like to endeavor to replicate the southern arc model. The samples are listed in the supplemental information. After doing that a few times, one could hone their abilities to produce their own models.

It would be interesting from that point, to start making prompts on threads where people could run that model against different populations of their choice.

I was able to get very far with trying to convert my VCF file. I managed to convert them in PLINK. But I downloaded a 2014 precompiled version of eigensoft for windows. However if fatally crashed because it couldn't read the chromosome because the software is too old. I need to figure out how to run a newer version of eigensoft, so I could convert the files to eigenstrat format, and read it in admixtools.

If I could do that, I could upload any file to admixtools.

Nice, I was able to use Ubuantu for Lynux, and install the latest version of Eigensoft. Moreover, I was able to convert my VCF from my WGS30x from nebula in the eigenstrat format!

@Acrhetype, I hear what you say about chatGPT, but honestly, I never would have been able to do this conversion and installation in a matter of a couple days without it. It was able to generate the right prompts for all of the various installers and software and could read outputs and give me the right answers. Code interpreter is phenomenal, you can even upload zip files for it to examine.

Jovialis · Aug 14, 2023

I am now set up for the ability to convert any VCF file into eigenstrat format to be examined in Admixtools.

VCF can be yielded from BAMS, and FASTQ. But I only currently have the means for VCF to Eigenstrat format.

Jovialis · Aug 14, 2023

Now with Eigensoft installed on my computer, I can endeavor to also make PCAs.

Jovialis · Aug 14, 2023

Looks like I may need to standardize and filter the VCF (or BAM) before even converting it to plink, and make sure it is in the same format as the Harvard set.

Palermo Trapani · Aug 14, 2023

Jovialis said:
@Palermo:

ITS2.HO M Ignore_Italian_South_1d.rel.HO
ITS4.HO F Italian_South.HO
ITS5.HO F Italian_South.HO
ITS7.HO M Italian_South.HO
BEL57.HO M Italian_South.HO

This is what the Reich Lab lists as Southern Italian in the ind. file

But one could edit them with a free software like Visual Studio Code to examine the samples individually. Basically qpADM combines the samples that all have the same label. So it is possible to rearrange them. I noticed many of them have very different (sometimes misleading/inaccurate) nomenclature for aDNA than is listed in the studies they originate from. For example the Central Italian Neolithic samples are listed as Italian_South_N, which is bizarre.

Thanks, Sorry I did not respond quicker. Internet is out at my home.

Francesco · Aug 14, 2023

Archetype0ne said:
GPT is notably bad at math. It's like 9 orders of magnitude off, x10^-10, you are looking for x10^-1, or ^-2.
5x10^-2 is 0.05, and 5x10-10 is 0.00000000005.

I'm totally ignorant about this, so forgive me if my question seems pretty stupid. What is the p value and does it need to be less than 0.05 or just near 0.05?

Archetype0ne · Aug 15, 2023

Francesco said:
I'm totally ignorant about this, so forgive me if my question seems pretty stupid. What is the p value and does it need to be less than 0.05 or just near 0.05?

In this case over .05

Jovialis · Aug 15, 2023

I did my best to replicate the modeling from Sarno et al. 2021. I believe I may have succeeded. P-value is good.

I copied the methodology from the study as best as I could when creating the components for the model, and the outgroups.

What I want to do is replicate studies and test different populations with them. I think it would be entertaining to run an academic model against my own WGS sample too.

ChatGPT:

The value "4.15e-2" is a scientific notation representation of the number $0.0415$. In many statistical contexts, this is interpreted as a p-value.

A p-value is a measure used in hypothesis testing to help determine the significance of a result. It represents the probability of observing a test statistic as extreme as, or more extreme than, the statistic computed from the sample, given that the null hypothesis is true.

Typically, in many fields:
- A p-value less than $0.05$ (or 5%) is considered statistically significant.
- A p-value less than $0.01$ (or 1%) is considered highly statistically significant.
- A p-value greater than $0.05$ is usually not considered statistically significant.

Given this general guideline, a p-value of $0.0415$ or $4.15e-2$ is considered statistically significant, but it's just below the common threshold of $0.05$. It's important to interpret this p-value in the context of the specific study, the stakes of making a type I error (rejecting a true null hypothesis), and other relevant factors.

^^
Italian_South.HO

21.3% EMBA_Steppe
54.4% Anatolia_N
1.9% WHG
22.4% CHG_Iran_N

P-value: 0.0415

torzio · Aug 15, 2023

Jovialis said:
I don't know, frankly, if the East Med samples in Rome isn't sufficient, it is a hard sell for me, considering that's primarily what people point to as the source.
Also, I have never heard of Slavic in Southern Italy. I only remember some guy named sikeliot/azzurro who pushed hard for that. Moreover, Germanic in southern Italy would even be very difficult to demonstrate in the south. The Lombards for example did not leave much of an impact at all on the south.

Lombards where mostly stationed in Benevento

for over 400 years

https://en.wikipedia.org/wiki/Duchy_of_Benevento

torzio · Aug 15, 2023

Molise Croats (Croatian: Moliški Hrvati) or Molise Slavs (Italian: Slavo-molisani, Slavi del Molise) are a Croat and Slavic community in the Molise province of Campobasso of Italy, which constitutes the majority in the three villages of Acquaviva Collecroce (Kruč), San Felice del Molise (Štifilić) and Montemitro (Mundimitar).[2] There are about 1,000 active and 2,000 passive speakers of the Slavomolisano dialect. The community originated from Dalmatian refugees fleeing from the Ottoman conquests in the late 15th and 16th centuries.[2][3]

Where they Slavic or Dalmatians that learned the slavic tongue

Idontknowwhatimdoing · Aug 15, 2023

Jovialis said:
I did my best to replicate the modeling from Sarno et al. 2021. I believe I may have succeeded. P-value is good.

I copied the methodology from the study as best as I could when creating the components for the model, and the outgroups.

What I want to do is replicate studies and test different populations with them. I think it would be entertaining to run an academic model against my own WGS sample too.

Where did you get WGS from and how much was it.

Keep in mind that good p value doesnt matter if your references are not strict enough. You can make models get good p value if you dont give qpAdm proper references/right pops to check if something else is needed. The way qpAdm works is that if something related to the references/right pops is needed to be added to the left then it will give you a bad p value.

Jovialis · Aug 15, 2023

Idontknowwhatimdoing said:
Where did you get WGS from and how much was it.

Keep in mind that good p value doesnt matter if your references are not strict enough. You can make models get good p value if you dont give qpAdm proper references/right pops to check if something else is needed. The way qpAdm works is that if something related to the references/right pops is needed to be added to the left then it will give you a bad p value.

I got Nebula Genomics WGS30x done a couple years ago.

right now you can get it for only $99.

Jovialis · Aug 15, 2023

Jovialis said:
I got Nebula Genomics WGS30x done a couple years ago.

right now you can get it for only $99.

When I did it, the cost was $500 I believe.

Now that it is as cheap as 23andme, or Ancestry.

Honestly, with ChatGPT, the who experience of setting up Rstudio, Visual Studio Code, Admixtools, Plink, Ubuantu, Eigensoft, has been fairly straight forward. But still very rigorous, with a lot of set backs. Nevertheless, I am close to getting it to work. I just need to make sure the VCF is in the same format as the Reich Lab data set. I got all the way to the end, where I was going to merge the geno snp and ind files together, but if failed because they were in a different format from one another.

Nevertheless, I think I need to download a couple more software programs to make the necessary changes to the VCF.

Idontknowwhatimdoing · Aug 15, 2023

Jovialis said:
I got Nebula Genomics WGS30x done a couple years ago.

right now you can get it for only $99.

I'm going to do that next then, if its only 99.

Jovialis · Aug 15, 2023

Yeah, I think it would be worth it.

Analyzing a personal 30X WGS sample in Admixtools2 and Eigensoft is really the pinnacle of this hobby.

Jovialis · Aug 15, 2023

Personally, I always sought after tests that would best emulate what academic studies show. But with admixtools, I can literally recreate those academic models. Which is better than any direct-to-consumer test can provide.

Francesco · Aug 15, 2023

Jovialis said:
I did my best to replicate the modeling from Sarno et al. 2021. I believe I may have succeeded. P-value is good.

Italian_South.HO

21.3% EMBA_Steppe
54.4% Anatolia_N
1.9% WHG
22.4% CHG_Iran_N

P-value: 0.0415

It would be interesting to replicate this model for ancient population as well, such as italics and Greeks

Comparing Ancient Greek populations to modern Greeks and Italians

Advisor

Active member

Junior Member

Advisor

Advisor

Advisor

Advisor

Active member

Banned

Active member

Advisor

Well-known member

Well-known member

Fledgling

Advisor

Advisor

Fledgling

Advisor

Advisor

Banned