Angela
Elite member
- Messages
- 21,823
- Reaction score
- 12,329
- Points
- 113
- Ethnic group
- Italian
We're pretty much on the same page, although not entirely.
We are making the same point with respect to calculators and Italians, just saying it in different ways.
Many of the "calculators" use Tuscans as the baseline for Italians, or, alternatively, the Tuscan samples are the largest in their databases. (This is similar to how, many calculators use people from Utah of Northern & Western European ancestry as a proxy and the biggest baseline for what people think of as a generic north/west European.)
Just like Utahans, who tend more often than not to be LDS, with an entirely different migration history than the rest of the country (more British, more Scandinavian, significantly less Irish) are not the perfect baseline for all Americans of western European heritage -- As you noted, Tuscans aren't always the best proxy for Southern Italians.
This causes Southern Italians, who have a right to "Italianness" as much as anyone else, to fall under other categories, with false positives and false negatives since they don't always match Tuscans closely. GIGO. So, we're saying the same thing on that point.
.
I'm sorry, but I don't think we're saying the same thing. That isn't how the ADMIXTURE program works, which is what these "calculators" are based upon. No one is comparing Southern Italians/Sicilians to Tuscans, or to Northern Italians for that matter.
Let's take a simple example where researchers are looking at modern groups alone. The algorithm is instructed to divide the data into a certain number of "cluster" or "K" groups. If you tell it to divide everybody into three groups you're going to basically get a "Caucasian" group, a "SubSaharan" African group, and a "Mongoloid" group. (Excuse the archaic terminology.) Depending on what the researcher is looking for, they might go to higher K. Some people like to limit it to 8 or so.
The clusters are usually then named for the area where they are "modal" or most frequent. In the calculators the genome of an individual is run through the same program and divided up into percentages of each cluster. That's what gives you your percent "Atlanto Med" or "Eastern European" etc. etc. Now, with the availability of ancient genomes our modern genomes are compared to them. That's how Tuscans, for example, get X percent Anatolian Neolithic or X percent WHG.
Only then do you get to the Oracle section, an algorithm first created and used by Dienekes if I'm not mistaken. The creator of the calculator has to then input modern sample reference populations for a comparison with the genome of the user. If there were no English reference sample, English people would come back as German or Norwegian or French or whatever, with very bad FST or goodness of fit numbers. With Italians it's even more important to get lots of reference samples because we have a lot of diversity. If there's no southern Italian reference sample then obviously the results for southern Italians are going to be garbage. On one of the early iterations of one of these calculators they only had a Tuscan sample to stand in for Italians. I wound up in the Balkans!
Things are getting better, though. There are now 4 Northern Italian samples, plus the academic Tuscan ones. There's also a sample from the Abruzzi, and there's a Western Sicilian sample, an Eastern Sicilian sample, and a Calabrian sample. One could argue the latter weren't the most representative areas from which to draw, and there should definitely be some samples from Campania and Apulia, but we have more areas sampled than a lot of other areas of Europe. Plus, in my opinion, I actually think there's rather more diversity in northern Italy than in southern Italy, perhaps because the latter were part of one governmental unit for so long (unlike northern Italy).
Anyway, your genome then gets compared to the genomes of these "reference" samples, and the similarity is computed. That's why southern Italians come out as southern Italian first and foremost. As I said upthread they may sometimes get Ashkenazi as a second or third best match. However, people have to understand that groups can wind up with similar ancestral proportions and therefore plot near each other on a PCA not because of a long shared genetic history, but just because by chance they were formed by similar ancient ancestral populations. That's why someone who is half Chinese and half English can wind up plotting near the Uighers. Or, as I said, maybe it's true that the Ashkenazim are partly descended from a southern Italian population.
Now, perhaps you were thinking of the 23andme analysis. Even there, however, users are compared to the reference samples. There's lots of southern Italians on there who've been chosen to represent southern Italians as a reference sample. It's the northern Italians for whom 23andme only has the 12 person Bergamo sample and the paltry few who have tested. Even with just those few, northern Italian users still plot north on the PCA, southern Italian users plot south, and Tuscans plot sort of midway but closer to northern Italians. (I can't believe they're getting rid of their PCA btw.)
I do know what you're getting at in terms of the computation of the "Italian" percentage. As I said somewhere or other, it can seem as if they threw magnetized markers representing all the "Italian" genomes, academic and from their users, onto a Board. Where the "cluster" is most dense, it's "Italian", where you have markers being "pulled" toward other strong "clusters", they find "minority" ancestry, like the 20-25% Northwestern European that some Northern Italians get. (For Italians, all "Southern European" percentages on 23andme are, in my opinion, "Italian". It's just that 23andme is very conservative in calling the segments.)
This is why you get these figures from a few percent to 15-20% which 23andme labels "Middle Eastern/North African" in Southern Italian/Sicilians. First of all, people don't understand, or choose not to mention because of bizarre agendas that on 23andme "Middle Eastern" doesn't mean Middle Eastern as everyone else in the world would understand it, i.e. Lebanon, Syria, Jordan, etc. It means Caucasus, Turkey, Iran. Now, that's ancestry that everyone in Europe carries (CHG), and certainly everyone in Italy carries. That number in Southern Italians just represents the "excess", in my opinion, over what the rest of the Italians carry. Then the question is when did it get to Italy?
The majority could not have come with the "Moors", in my opinion. They may be responsible for some of the 1 or 2% SSA that shows up for some people in certain calculators (some of it may be even more ancient), and the 2-3% NA that also shows up. A few percent more "Caucasus" could have arrived with them, but it has to be extremely minor because North Africans from Egypt all the way west just don't have much of it themselves. In fact, the entire legacy has to be pretty minor if uniparental markers mean anything at all, and I think they do. (The yDna is about, what, 6-7%, and the mtDna even less.)
So, it's more likely, in my opinion, that it came before that, and I doubt much of it is "slave" ancestry, for all the reasons Razib Khan pointed out, but also because of the Italian cline. Slaves came from all over the world, the north,west, and east as well as the Middle East and North Africa, and slaves went all over, to other parts of Italy and all other parts of the Roman world, as I'm sure you know but others may not. There was no regulation saying ok all Middle Eastern slaves are going to go to southern Italy, and all northern slaves are going to go to northern Italy. It didn't work like that. So, it's probably earlier. My guess is late Neolithic/Bronze Age, maybe early Iron Age. It may have come with Indo-European speaking peoples for all we know, and I think most of it was mediated through the Greek mainland and Islands. Part of my rationale for believing this is because in most calculators, for example, mainland Greeks have the same amount or higher levels of Caucasus (or "West Asian" in the older ones), and the rest of the Balkans isn't far behind.
(One can't use the "Greek" results on 23andme to analyze these migration patterns, no matter what the "usual suspects" on racist anthrofora may say when taking my words out of context, because the "Balkan" cluster at 23andme is a mess. It includes "Malta" for goodness sakes, so you're getting only the "excess" southern genes for Greeks and Balkanites.) The Spaniards are in a different situation because they have much more North African and some areas have more SSA. Based on the recent excellent analyses of their mtDna, some of it is ancient, but some of it is definitely from the period of Moorish domination as well. In one recent analysis specifically of CHG or "Caucasus", West Asian ancestry, the Spaniards seem to score from 25 in the far north to 30-31%, in comparison to 34-35 for Bergamo, from 0-3 points more of WHG. However, they have slightly more Anatolian Neolithic. I can't provide the figures for southern Italians because they weren't included in that particular run.
A final word about ADMIXTURE and even 23andme. They are both very subject to drift, meaning that there may be more underlying genetic similarity between all Italians than either analysis shows.