How to evaluate genetic analysis

Angela

Elite member
Messages
21,823
Reaction score
12,329
Points
113
Ethnic group
Italian
Actually, it should probably be labeled how to evaluate scientific papers, or how to evaluate statistical analysis.

The major take away seems to be: skepticism is in order.

This is just one example;

See: http://nymag.com/scienceofus/2017/0...or-problem.html?mid=twitter-share-scienceofus
Given the practices that the director unwittingly revealed, I personally now have no faith in any papers from that lab, even though some of them may be legit, because having no access to any of the runs, I can't evaluate them.

This provides some guidance on how to evaluate results:

"
Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so—and yet these misinterpretations dominate much of the scientific literature. In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions. Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations. We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect. We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power. We conclude with guidelines for improving statistical interpretation and reporting."

https://link.springer.com/article/10.1007/s10654-016-0149-


See also the following for the data that would have to be retained by scientists and available for use by anyone wishing to look at the findings critically:

"#SLAS2017 Bioinformatics at @bmsnews adopted the @PLOS editorial's "10 simple rules for reproducibility" http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285 …
 
I never perceived myself especially picky in regard to quality of scientific papers. This changed when I began to read papers about population genetics. Meanwhile I agree with an astronomer, with whom I had a discussion many years ago, who stated:
'The so-called human sciences aren't science at all!'
...and that wasn't even cynical.

To break it down to simple observation - 'real' science is based on robust data and statistics is there to support and complement the results of the research. Many proponents of the said 'human sciences' on the other hand have the statistics as basis of their scientific results, and these statistics often are not even founded on strict facts, but on models and assumptions. You are going to Pisa and build a tower on the sand and a few years after that you wonder why it's so lopsided. You were sooo accurate with your measurements...It doesn't even need bad handling or interpretation of your statistics, it's the very statistics itself that causes the trashiness of many scientific work.

But not only that. A lot of papers are sloppy with the precise description of the scientific goals, the description of the data sampling is often close to hilarious.
Let's look at an example (from Di Giacomo 2003, 'Clinal patterns of human Y chromosomal diversity....')
We analyzed blood samples from a total of 890 male subjects with known parental and granparental origins:
524 Italians (locations 1–17 in Table 1 and Fig. 1), 154 individuals from continental Greece (locations 18–24),
212 subjects from Crete (locations 25–28), and the Aegean islands of Lesvos and Chios (locations 29–30).
Informed consent was obtained in all cases. Sampling was anonymous in order to prevent link to the original
donor.
That's all? Really?? The most important thing, the data collection, in three sentences without specifics! Think of a staff member of your municipial water supplier, who took some samples from various locations and writes down:
"We took water probes from 150 households in region xxx. Consent was obtained in all cases, the samples were anonymised. Period."
He'll get the boot, if I'm the one who is in charge of the project, that's for sure!

The problem is, the 'scientists' obviously don't even know, what to write in a proper report. Add to this ... insufficient work in the lab, bad statistics, wrong models, ridiculous interpretations of the data. Admitted, budgets for research are usually very small and you have to cut costs at any single step, but squeezing results out of a lack of data with the help of questionable models and tweeking statistics to your purpose is simply wrong.

And another one...
Good scientist:
Curiosity -> research -> found pattern -> tackle it down, find rule/law -> try to falsify -> make theory -> present it -> accepted -> thesis.
Bad scientist:
I have an idea/theory -> find evidence for support -> kill the mosquitoes, which annoy you (opponents that is) -> present -> accepted -> guru.
Guess where you find more of the first kind, and in which scientific sector you find A LOT of the second group?

Ok, rant over - expecting fist!
 
Now we know. There are not that many geniuses, and too many "scientists" with bad common sense, faulty pattern recognition, and heightened spirituality which makes them see things and conclusions that don't exist. Should we blame us for this? We are much better tuned for hunting, making cheeses and praying to gods, than understanding inner working of the world.
Perhaps soon AI will help a lot with understanding statistics and weeding out bad studies.
 
Imo, in line with what LeBrok noticed, the articles are a subtle indictment of the intellect of the scientists, in that many of them don't have the mathematical ability or training to understand how to use statistics. It's also an indictment of their sloppiness, or perhaps their ethics. If your work is not completely transparent, and every step isn't documented, how can anyone check to see whether it is correct or not? That's why reproducibility is such a big problem. Now go down one level to amateurs and imagine what you get. I've only ever seen one amateur who was obviously a gifted mathematician.

Good science depends on good math. You start with a hypothesis. You test it. You make sense of it and interpret it through the use of statistics, and yes, you create models. That's how they build nuclear reactors for submarines, or design new planes. It's all applied mathematics. If you don't understand higher level statistics, you're going to produce shoddy results.

Here is a treatment of the career of Nick Patterson at the Reich Lab which I just saw. He's basically a mathematician. He has used his talent in applied mathematics in numerous fields and now he's using them in genetics. He's creating these algorithms that amateurs play around with.
https://www.radcliffe.harvard.edu/news/radcliffe-magazine/man-who-breaks-codes
 
"Patterson deciphers genetics by using mathematical models, computer algorithms, and statistical pattern recognition—techniques he once used as a secret foot soldier during the Cold War. But this time he can speak up. “I’m working on ancient DNA data,” he said, “trying to get a story out of it.” "

I'am very curios about this. What will it bring?
In this fragment there is a kind of contrast especially in "statistical pattern recognition" and “trying to get a story out of it”. I suppose that they are a different cup of tea. Statistical patterns are based on laws, based on already known premisses and what you get are generalizations. Historical stories are narratives about the unknown, the unique, you have to make estimations and judgments (=all not law orientated). How to combine this?
So I'am curious about the step from the statistics to story. Will it be more than a generalized kind of explanation?

PS old literature, but makes some good remarks about the differences:
http://www.personal.psu.edu/users/e/c/ecb5/Courses/M475W/WeeklyReadings/Week01-Aug26-Overview/The%20Mathematician,%20the%20Historian,%20and%20the%20History%20of%20Mathematics.pdf
 
I don't know if I'm understanding the arguments wrong or not, but I have the feeling that everybody thinks that science is just a matter of statistics. Let me therefore adjust some things about statistics. I will compare it with image processing, because it is nothing else than statistics applied to every single pixel of your picture. With just a bit of experience in that matter everybody has a basic understanding of statistics without a single lesson in your nerd school.

Ask any photographer, all of them will tell you one thing - a good picture shines on its own, bad pictures need photoshop. Translated into statistics you get a golden rule:

Good data reveal their information content on their own, bad data need statistics.

The reason, why you use filtering techniques on your picture is to separate garbage (noise) from your information (image) and/or select out certain parts of information at the cost of others, which you are not interested in. The filter itself is a clear mathematical algorithm and can be reproduced, in contrast to the noise, and in an ideal world should cancel out the noise. Which bring us to the next fact:

Every single statistic operation does nothing else than produce an artifact.

If you do everything right, your series of statistic operations cancels out all the unwanted noise and superfluous information, at least in theory. Praxis tells a different story. Coming back to our photographer...Let's assume we take a green laser pointer with a dispersing lense and illuminate an object, which we want to photograph. The picture will be visible in all three colour channels (the colour sensors overlap in their sensibility) so you may produce a white-balanced colour picture, but you won't get colour information, because it's not in the original picture. Which brings us to the next conclusion:

Not even the most sophisticated statistical method can correct biased data.

Distressing but fact. Another problem you may face is, if you don't have enough information in your data set. When you subtract the filter (=artifact) from your garbage, you reduce the mathematical accuracy of your filter prediction. If the information content is less than this filter blurring you will get either another random noise as result, or - if you don't remove the filter value completely - your result is the filter artifact, something you get regularly on statistical operation. Instead of information you have just the residues of your statistical operations. Which leads to another statement:

If you have bad data and make massive use of statistics not even an experienced statistician can distinguish between information and artifacts.

So what's the conclusion? Without proper data acquisition statistics are pretty much useless. Seeing statistics as the holy grail of science is misleading, getting the perfect data is the only way you can get professional results. Good statistics is just a tool to get every bit of information out of your work, but never should be used to tickle at least a bit of information out of an otherwise useless data set.

Cheerio to all the statistics fetishists!
 
I'm afraid that seems like a straw man argument to me.

No one here has ever said or implied that proper and unbiased data selection isn't extremely important. In fact, we spend a lot of time pointing out when data hasn't been properly selected.

That isn't the point of this particular post. The point is that once you have good data you need to understand how to properly use statistics to interpret it. You also have to be rigorous in recording all stages of the analysis so that your conclusions can be checked.

Maybe you should do more collection of data here by reading more of the content so that you don't come to hasty conclusions.
.
 
Maybe you should do more collection of data here by reading more of the content so that you don't come to hasty conclusions.
.
Maybe you should take some of your medicine as well...
"Good science depends on good math... It's all applied mathematics." -> "I don't know if I'm understanding the arguments wrong or not, but I have the feeling that everybody thinks that science is just a matter of statistics."
"If you don't understand higher level statistics, you're going to produce shoddy results." -> "Good data reveal their information content on their own, bad data need statistics."
In fact, even clumsy statistics cannot ruin good data material.

you don't need to share my opinions, but you do not read more of my arguments than i read yours, so your stomping on the floor is not exactly shaking me.
 
You're new here so I'm going to give you a pass this time. Cut out the ad hominen like comments like "nerds" or "stomping the floor". I don't know where you're used to posting, but this won't fly here.
 
I don't know if I'm understanding the arguments wrong or not, but I have the feeling that everybody thinks that science is just a matter of statistics. Let me therefore adjust some things about statistics. I will compare it with image processing, because it is nothing else than statistics applied to every single pixel of your picture. With just a bit of experience in that matter everybody has a basic understanding of statistics without a single lesson in your nerd school.

Ask any photographer, all of them will tell you one thing - a good picture shines on its own, bad pictures need photoshop. Translated into statistics you get a golden rule:

Good data reveal their information content on their own, bad data need statistics.

The reason, why you use filtering techniques on your picture is to separate garbage (noise) from your information (image) and/or select out certain parts of information at the cost of others, which you are not interested in. The filter itself is a clear mathematical algorithm and can be reproduced, in contrast to the noise, and in an ideal world should cancel out the noise. Which bring us to the next fact:

Every single statistic operation does nothing else than produce an artifact.

If you do everything right, your series of statistic operations cancels out all the unwanted noise and superfluous information, at least in theory. Praxis tells a different story. Coming back to our photographer...Let's assume we take a green laser pointer with a dispersing lense and illuminate an object, which we want to photograph. The picture will be visible in all three colour channels (the colour sensors overlap in their sensibility) so you may produce a white-balanced colour picture, but you won't get colour information, because it's not in the original picture. Which brings us to the next conclusion:

Not even the most sophisticated statistical method can correct biased data.

Distressing but fact. Another problem you may face is, if you don't have enough information in your data set. When you subtract the filter (=artifact) from your garbage, you reduce the mathematical accuracy of your filter prediction. If the information content is less than this filter blurring you will get either another random noise as result, or - if you don't remove the filter value completely - your result is the filter artifact, something you get regularly on statistical operation. Instead of information you have just the residues of your statistical operations. Which leads to another statement:

If you have bad data and make massive use of statistics not even an experienced statistician can distinguish between information and artifacts.

So what's the conclusion? Without proper data acquisition statistics are pretty much useless. Seeing statistics as the holy grail of science is misleading, getting the perfect data is the only way you can get professional results. Good statistics is just a tool to get every bit of information out of your work, but never should be used to tickle at least a bit of information out of an otherwise useless data set.

Cheerio to all the statistics fetishists!

Hear hear! Positvism simply most know it's limits......
 
Yes indeed, let's go back to the belief that we have a direct, intuitive access to truth about the world: ie. mysticism, rationalism. Let's chuck out logic while we're at it.

Sorry, count me out. That has no place in discussions about genetics or any of the other fact based subjects we discuss here.
 
Yes indeed, let's go back to the belief that we have a direct, intuitive access to truth about the world: ie. mysticism, rationalism. Sorry, count me out. That has no place in discussions about genetics or any of the other fact based subjects we discuss here.

Black or white? Knowledge of the limits and pitfalls equals not throw it away.... neither pleas for mysticism etc.
 

This thread has been viewed 5488 times.

Back
Top