Principal Component Analyses (PCA)-based findings are highly biased.


Yes, I saw it. Did you notice the author?

Anyone who knows anything about population genetics knows that PCAs give results based solely on the samples you choose, so it's relatively easy to manipulate the tool to show what you want it to show.

That's why I ignore some amateur produced PCAs; I know the creator(s) aren't above "fiddling" with the samples they choose to include.

Plus, it's two dimensions.

Most "hobbyists" don't understand its inherent limitations.
 
Thank the powers that be, never understood how using PCA programs could be used effectively in genetics. You need a way to 'moor' the findings to a stable structure such as a helix for DNA bits or maybe even a map that identfies gravesites and maybe migration routes in order to have the values not stretch to show you what you want.

The values twist, sometimes completely flip into the data you were looking for when dealing with a fourth dimension and not incorporating time as a factor.
But I haven't kept up with with new findings. Reading the article I hope the community moves away from pca statistical modeling for mixed populations and focus more on where certain bits of DNA are first found but thats just my hope. I'll keep reading and watch what happens to the genomic community.
I'm just a hobbyist with biases against pca modelling
 
That's why I always use Y-Dna to verify my claims.

Ydna can wind up telling you little and sometimes nothing about who you really are.

My father, and practically every male in his corner of the world, was U-152. My mother bequeathed me a U2e lineage straight from the steppe.

Even my father was probably not more than 30% steppe, and my mother less. Both of them, and me, are mostly Anatolian Neolithic with some extra CHG/Iran Neo.

I don't deny or ignore the steppe ancestry, but it's only part of who I am and who they were.

Yes, uniparentals can help us track migrations, but it's not an identity.

Identity is composed of all of you, all your genome, your language, history and culture.

To fill in the pre-history or even the history before the medieval period, we need to use all the tools at our disposal: uniparentals, PCAs, Admixture, qpAdm, all of them, while understanding the limitations of each one.

You can't be a linear thinker in this discipline; it leads to tunnel vision and error.
 
In case I am misunderstood, I am not a polemic 100% of the use of PCAs, on the contrary.

PCAs can give you a rough idea of where to tread more often than not.

What I am against and that's why I might come across as a zealot is using them as some sort of gospel, especially when it comes to fine tuning the minor or the extremely minor components, in order to re-write history.

For example, I've seen on the web that one of the arguments used against Lazaridis' paper (2022) to counter the observation that EHG does not seem to exist in Anatolia is constructed on the fact that the closed source PCA of choice shows such admixture 1-3%.

I don't find this line of argument serious.
 
^^It isn't serious science, but it's very serious politics.
 


All the criticisms he makes can be made of any multivariate analysis tool. It criticises PCAs because they are the most widely used.

Genetics has been full for over 30 years of studies with contradictory results and absurd conclusions (reductio ad absurdum), cherry-picking and circular reasoning. And certainly not only for PCAs.

Many genetic studies of the last 30 years have turned out to be highly biased in hindsight. It is really a problem of genetics (and geneticists) regardless. At amateur levels it becomes absolute anarchy.
 
It is good that Erhan Elhaik bring this problem for more attention for the scientific comunity. There is such issue and I was also trying to explain about it, however it is a lot of work just to explain the issue, give some examples, show the limitations etc..
Erhan did some work about this. My point of view is quite different, I see also many advantages in PCA, that Erhan is not talking about and nobody is talking about it.

There is also an issue with EHG, because the way they found them is based on PCA, however they don't look properly on this PCA projection. EHG should not be considered a different cluster, they are closely connected to WHG with some Central Asia admixture.

CHG are a different story. There is no big connection between EHG and CHG.
 
"CAN BE highly biased" might be more accurate. Check this out, PC1 and PC2 map pretty close to latitude and longitude:

 
"CAN BE highly biased" might be more accurate. Check this out, PC1 and PC2 map pretty close to latitude and longitude:




This PCA uses a very old set, has been in various papers for years, and is perhaps one of the worst around. So much so that SK (Slovakia) ends up behind Cyprus.
 
In case I am misunderstood, I am not a polemic 100% of the use of PCAs, on the contrary.

PCAs can give you a rough idea of where to tread more often than not.

What I am against and that's why I might come across as a zealot is using them as some sort of gospel, especially when it comes to fine tuning the minor or the extremely minor components, in order to re-write history.

For example, I've seen on the web that one of the arguments used against Lazaridis' paper (2022) to counter the observation that EHG does not seem to exist in Anatolia is constructed on the fact that the closed source PCA of choice shows such admixture 1-3%.

I don't find this line of argument serious.

PCA studies should lay all their cards on the table, laying out the dataset, assumptions, and how much of the variance is accounted by each PC. Anybody should be able to replicate their experiments/studies. Otherwise they are useless. The same criticism should apply to all studies and tools.
 

This thread has been viewed 3295 times.

Back
Top