• Don't want to see ads? Install an adblocker like uBlock Origin or use a Europe-based privacy-friendly browser like Vivaldi or Mullvad.

(QUESTION) How to model central italians on qpadm

Just as an aside, Italy_Bivio_Roman.SG is essentially the AADR "official version" of C6.

It is also Central Italian, it comes from Umbria I believe. It is not the other Bivio near the Swiss boarder.

Italy_Bivio_Roman is most likely a sample (or the average of four individuals?) from Bivio della Croce dei Missionari, few kms outside Urbino (Pesaro and Urbino province, Marche region).

g1bgSWP.png


lVgOtBV.png
 
Italy_Bivio_Roman is most likely a sample (or the average of four individuals?) from Bivio della Croce dei Missionari, few kms outside Urbino (Pesaro and Urbino province, Marche region).

g1bgSWP.png


lVgOtBV.png
It is a set of 4 samples.

Thanks for the image, looks like it was Marche.
 
Not purporting this to be true in a literal sense. But perhaps in a broad sense some of the "Langobard" could be North of Alps ancestry that could have arrived at any given time.

I ran f4(Target, Source; Rightk, Rightl) (which I got the kinks out of the code, and will post later) across all right pairs for the outgroups.


View attachment 19124

I actually can be modeled similarly but with a weaker single against the Langobards, but passable:
View attachment 19125

The model fails for North Italians however, making it less plausible overall in my eyes:

View attachment 19126

I agree that a part of this "Langobard" component is likely from multiple other waves of other "North of Alps ancestry" which are aggregated as Langobard at those kind of models. But, I still think that Germanics likely played some role, as shown in this post:
Perhaps he was referring to central/southern Italians, that statement is more true for them, while i still don't agree with such an oversimplification.
I think that in various areas of Italy there is some degree of continuity. The Balkan shift is still noticeable in Adriatic Italy, Umbria is too EEF shifted to be derived from Imperial Roman+Germanic, Southern Italy tend to lack the proper late IA sources ecc...
The Aegean influx before the imperial roman period and the later Byzantine contribution(which i think is underestimated in Southern Italy where it even left the Greek language spoken to this day, once more widespread) are surely part of the shift which can't be attributed solely to Imperial Anatolian/Mesopotamian.
But i think we agree on this, since you were one of the first to point this out.
I also suspect that if appenninic inland and remote villages are analyzed, instead of big cities and coastal areas, we will see more conservative profiles, with greater native contribution, richer in EEF and lower in Anatolia_BA.

Here are the figures from the study:
View attachment 19127

View attachment 19128


We can see that, aside from the outliers, the germanic is present albeit in low amount, and also a bit of Baltic_BA likely derived from Slavs, mediated by Byzantines and/or picked up by Langobards/Goths.
It is interesting to note some amount of this pink component (EstoniaBA) in the Medieval Foggia samples.

Medieval Italians show affinity toward Germanics via IBD (and most of the samples are from Central Italy, from what I've seen, so this figure may be slightly higher at Northern Italians).

As also mentioned in this thread, this qpAdm model is expected to fail for North Italians. Even if prior to the arrival of the C6 component, IA North Italians were identical to IA Central Italians (Etruscan-like), Central Italy has received more of the C6 component than North Italy, so before the middle ages, when Langobards/Ostrogoths arrived, North Italians Imperial were already almost certainly different from Central_Italy_Imperial. I suggest to add Verona_LIA, or at least Italy_IA_Republic to control for this ancestry in North Italians, until we have better IA North Italians sources. It's also possible that IA Northern Italians already had a different profile from IA Central Italians (due to several possible sources, such as Picenes, Gauls, etc.), which further complicates this model. By the way, I plan to comment more on Picenes and redo my models/PCA with them when I get the data for qpAdm/smartpca analysis. The use of Italian_North.HO may also over-estimate the role of Bergamo, which shows more affinity towards the EEF in PCAs.

Another suggestion that I have is to add more pops to the outgroups list. I think you can use less ancient populations, since those admixture events that we are measuring are much more recent than the populations you used. I also think it's a good idea to consider rotating the sources. If you have 2 possible sources (e.g. Germanics vs Picenes), add the one you are not using in the model as an outgroup (you can apply this logic to more sources if they are not hyper similar). This method was shown to improve the qpAdm detection of the true source (by rejecting the models with the wrong sources) in Harney et al. 2021.
 
Last edited:
It is a set of 4 samples.

Thanks for the image, looks like it was Marche.

Yes, it is the northernmost area of the Marche region where a northern Italian dialect is spoken. Therefore, it is more of a transition area between central and northern Italy. Obviously on the easternmost side and leaning the Adriatic coast. This Roman necropolis is located about 35-40 km from the Picene necropolis of Novilara.
 
I’ve come up with an example so that everyone can understand the current problem with PCAs and the bias that is applied to each study, and how to correct that bias in order to understand in a neutral way the problems you have with modelling. This should not focus on building random triangulations until finding an “optimal” result — that distorts reality. PCAs are something more like circles trying to become spheres: 4 dimensions, where autosomes coincide (1 Y + 2 Mt), 3 time and 4 space. That’s why, in order to model properly, you need populations that do not overlap too drastically.

The problem is that in practice all this is translated only into 1 autosomes and 2 space.

U152* shares a lot of IBD with Hallstatt populations, but they have different STRs; this is very often confused with “Germanic”.

The goal of PCAs and modelling should be that they are universal, not to deform them every time we find 4 new samples.


IMG_5901.png

IMG_5899.jpeg

Both in Jovialis’ PCA and in the academic study below, the population control triangle that I always use — which at a glance is Basques, Sardinians and Iberians — I make sure they form a right triangle so that their distances are well balanced, keeping Sardinians as the most EEF population compared to any other today.

Another study that used this same projection was the Phoenician study.

In all these cases the zero axis would be something like an Anatolian with modern Greek affinities. But in that zone there is only one — there is kind of a blank sphere there because that axis is reserved for “nobody”, it is point 0. If someone falls there it must be due to admixture; there cannot be pure populations exactly on axis 0.

It’s shocking how many studies make this mistake; I’ve already seen it in many. Anyone who does not take this into account in their modelling will have something wrong.


IMG_5898.jpeg

In this other study, which has to rotate the PCA with respect to the previous ones, the axis is deformed in the direction of Iran N, which makes the control triangle become isosceles. This inflates the % CHG in Imperial Romans, and what appears as 25% Iran would in reality be around 12.5%, which would match the mixture with Southern Italians and Sicilians, things from the Aegean with high CHG — not something that just arrived from Turkey, nor Lebanon, nor Syria, nor Armenia, nor Iran.

In this case it is a modern Greek who falls on axis 0 — wrong. What that PCA is measuring is how Greek the Imperial Romans were, and it fails for that reason.

Something that is not being measured but must exist is that Sicilians have more Natufian than Aegeans, probably due to the Sea Peoples > Phoenician period between 1400–400 BC.



IMG_5897.png

Extreme case. In this other one they have deformed the axis so much towards EHG–ANE that the control triangle becomes obtuse, which causes weird things like Northern and Southern Italians swapping their natural positions because the South has more CHG of the same type that exists inside EHG–ANE.

In this case, Austrians fall on axis 0.

That is the big problem of the “steppe theory”. The axis I use is already deformed by Harvard by default: Sardinians are 80% EEF, Iberians and Italics 50–75% EEF.

CHG–ANE is always hugely inflated; the South is always assigned 25–20% but in reality it never goes above 10%.

In G25, on “axis 0” you also find Germans–Austrians–Hungarians.

Once there was an Austrian painter who said that his ethnicity was axis 0…

G25, broadly speaking, measures how “Germanic” you are.

Axis 0 should be occupied by populations between the Black Sea and the Aegean, basically because from there they split between 10,000–5000 BC.


In summary, if I were a professor of archaeogenetics supervising a thesis and you came to me with a PCA where the centre is occupied and certain axes — based on populations that have not changed in 5000 years and that I use as “control points” like Basques, Sardinians and Iberians — are deformed, I wouldn’t just fail you, I’d slap you twice.

Axis 0 does not exist; it is you yourself who must “control” that no present-day population ends up on that axis, so as not to deceive yourself or anyone who does not fully understand this fact.

Modelling consists of placing the population you want to model on axis 0, but that is not valid for a general explanation.

Modelling is one thing and making ancestral projections is something very different.

That is, first you find the axis 0 of the population to be modelled, and then for the study you use a “neutral” axis 0.
 
Interestingly, with a single left source, with this right panel, I come out as statistically indistinguishable from Italian_Central.HO, using both ALLSNP=TRUE & F2-Block methods:

1770040888613.png
 
Back
Top