Is Most Published Research Wrong?

Veritasium

11 Aug 201612:22

EducationalLearning

32 Likes 10 Comments

TLDRThis script delves into the reliability of scientific research, questioning the validity of findings due to the misuse of p-values and the publish-or-perish culture in academia. It highlights the 'replicability crisis', showing that many prominent studies, including those in psychology and particle physics, have failed to replicate. The script also discusses the concept of 'p-hacking' and the incentives that lead researchers to prioritize publication over accuracy, but concludes with a note of optimism about recent efforts to improve scientific integrity.

Takeaways

🔮 The script discusses a study suggesting humans can see into the future, but questions the validity of such claims based on statistical significance.
📊 It highlights the use of p-values in determining the significance of study results, with a p-value of .01 indicating a 1% chance of the observed effect being due to chance.
🧐 The script challenges the common threshold of p < .05 for statistical significance, pointing out that it was arbitrarily chosen and may not be a reliable standard.
🤔 It raises concerns about the prevalence of false positives in scientific research due to factors like publication bias and the high rate of false hypotheses being tested.
🧬 The script mentions the 'Reproducibility Project' which found low rates of replication success in psychology studies, questioning the reliability of published research.
🍫 It uses a humorous example of a study claiming chocolate aids weight loss to illustrate the concept of 'p-hacking' and how small sample sizes can lead to misleading results.
🔬 The script points out that even in fields with stringent statistical requirements, like particle physics, false discoveries can occur due to biases in data interpretation.
📉 It discusses the incentives in scientific research that favor novel and statistically significant findings, potentially leading to an overemphasis on positive results.
🔄 The script acknowledges the challenges in replicating studies and the reluctance of journals to publish replication studies, which can hinder scientific self-correction.
🌟 It emphasizes the importance of peer review and methodical rigor in scientific research, despite the inherent flaws and the potential for incorrect conclusions.
💡 Lastly, the script concludes by reflecting on the human tendency to delude ourselves and the value of the scientific method as a more reliable way of knowing compared to other methods.

Q & A

What was the title of the article published in the 'Journal of Personality and Social Psychology' in 2011?
-The title of the article was 'Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect'.
What was the main claim of the 2011 study regarding the ability of people to see into the future?
-The main claim was that there was experimental evidence suggesting that people could have anomalous retroactive influences on cognition and affect, essentially implying the ability to see into the future.
What was the hit rate for participants when selecting the curtain with an image behind it, and what was considered significant?
-The hit rate for participants was 53% when selecting the curtain with an erotic image, which was considered significant because it was higher than the expected 50% chance level.
What is a p-value and how is it used to assess the significance of study results?
-A p-value is a statistical measure that indicates the probability of obtaining results at least as extreme as the observed results, assuming that the null hypothesis is true. It is used to assess the significance of study results, with values less than 0.05 generally considered significant.
What is the issue with using a p-value threshold of .05 for determining statistical significance?
-Using a p-value threshold of .05 can lead to a high rate of false positives, especially when multiple hypotheses are being tested or when there is publication bias towards positive results.
What is 'p-hacking' and how does it increase the likelihood of false positives in research?
-P-hacking refers to the manipulation of data or statistical analysis methods to achieve a p-value below the threshold of significance (typically .05). This can involve selecting or excluding data points, changing analysis methods, or considering multiple variables, which increases the likelihood of finding at least one significant result by chance.
What was the result of the Reproducibility Project that attempted to replicate 100 psychology studies?
-The Reproducibility Project found that only 36% of the psychology studies had statistically significant results when replicated, indicating a significant issue with the reproducibility of published research.
What is the '5-sigma' standard used in particle physics and why is it significant?
-The '5-sigma' standard is a stringent requirement for statistical significance used in particle physics, which corresponds to a probability of less than 0.0000035 of obtaining a false positive. It is significant because it greatly reduces the likelihood of claiming a discovery based on random chance.
What is the impact of publication bias on the reproducibility of scientific findings?
-Publication bias, where journals preferentially publish studies with statistically significant positive results, can lead to an overrepresentation of false positives in the scientific literature and makes it difficult for researchers to assess the true validity of findings.
What steps are being taken to address the reproducibility crisis in science?
-Steps being taken include conducting large-scale replication studies, establishing platforms like Retraction Watch to publicize withdrawn papers, creating online repositories for unpublished negative results, and adopting practices like pre-registering hypotheses and methods for peer review before experiments.
Why is it important to consider the potential for error even when using the scientific method?
-It is important because even with rigorous methods, errors can occur due to various factors such as p-hacking, publication bias, and the complexity of interpreting data. Recognizing this helps maintain a critical approach to scientific findings and encourages continuous improvement in research practices.

Outlines

00:00

🔮 The Illusion of Future Sight in Scientific Studies

This paragraph discusses a controversial study published in the 'Journal of Personality and Social Psychology' that suggests humans may possess the ability to see into the future. The study involved nine experiments where participants predicted which of two curtains hid an image. The hit rate for erotic images was slightly higher than chance, leading to a p-value of .01, which is considered significant in scientific research. However, the paragraph questions the validity of this result, explaining that a p-value of less than .05 is typically seen as significant but may not be enough to accept extraordinary claims like perceiving the future. It also delves into the broader issue of false positives in published research, highlighting that the commonly used 5% threshold for statistical significance may not be stringent enough, and that the actual rate of false positives could be much higher due to factors like publication bias and the prevalence of 'p-hacking'.

05:03

🧐 The Perils of P-Hacking and Replication in Scientific Research

The second paragraph expands on the problem of false positives in scientific research, illustrating how p-hacking can lead to misleading results. It uses the example of a study claiming that eating chocolate aids weight loss, which was later revealed to be a result of small sample size and multiple measurements increasing the chance of false positives. The paragraph explains how researchers can manipulate data analysis to achieve significant p-values, even when there is no real effect. It also touches on the high standards of statistical significance in particle physics and the infamous case of the pentaquark, which was initially confirmed by multiple experiments but later debunked as a false discovery due to biased data analysis. The paragraph emphasizes the importance of replication in science but points out the challenges and biases that can hinder the process, including the reluctance of journals to publish replication studies and the pressure on researchers to produce novel and significant findings.

10:03

🛠️ Towards Improvement: Addressing the Reproducibility Crisis in Science

The final paragraph acknowledges the ongoing reproducibility crisis in science but highlights positive changes in the scientific community's approach to research. It mentions large-scale replication studies, the Retraction Watch website, and the use of online repositories for sharing negative results. The paragraph also discusses the move towards pre-registering studies, which can help reduce publication bias and p-hacking by ensuring that research is published regardless of outcomes, provided the methodology is sound. The narrator reflects on the human tendency to be misled, even with rigorous scientific methods, and emphasizes the importance of science as a reliable method for understanding the world, despite its flaws. The paragraph concludes with a thank you to supporters and a promotion for Audible.com, offering a free trial and recommending a specific book.

Mindmap

Keywords

💡Anomalous Retroactive Influences

Anomalous Retroactive Influences refer to the phenomenon where future events seem to influence past actions or cognition. In the context of the video, this concept is explored through an experiment suggesting that people might be able to predict the future. The script discusses a study that purportedly found evidence for this phenomenon, challenging the conventional understanding of time and causality.

💡p-value

A p-value is a statistical measure that indicates the probability that an observed result occurred by chance if the null hypothesis is true. The video explains that a p-value of .01 suggests a 1% chance of the observed results happening by mere luck, which is considered significant in scientific research. The script uses p-values to discuss the validity of the study's findings on predicting the future.

💡Statistical Significance

Statistical significance is a term used to describe the likelihood that an observed effect is real and not due to chance. The video discusses how results with p-values less than .05 are typically considered statistically significant and publishable. However, it also points out the potential issues with this threshold, such as the arbitrary nature of the .05 value and the possibility of false positives.

💡Reproducibility

Reproducibility in science refers to the ability of other researchers to obtain the same results when the same methods are applied. The video highlights the importance of replication in verifying scientific findings. It mentions the Reproducibility Project, which attempted to replicate psychology studies and found a significant drop in the proportion of studies that achieved statistically significant results upon replication.

💡Null Hypothesis

The null hypothesis is a fundamental concept in statistical testing, typically set up as a statement of no effect or no difference. In the video, the null hypothesis is that people cannot see into the future, and the study's results would be due to chance. The p-value helps determine whether the observed data contradicts this null hypothesis.

💡False Positives

False positives occur when a test incorrectly indicates a positive result when one does not exist. The video discusses how the use of p-values and the threshold of .05 can lead to a significant number of false positives in scientific research, especially when multiple hypotheses are being tested.

💡p-hacking

p-hacking is the practice of manipulating or selectively choosing data or statistical techniques to achieve a result that is statistically significant. The video describes how researchers might engage in p-hacking to increase the likelihood of obtaining a p-value less than .05, which can lead to false conclusions being published.

💡Publication Bias

Publication bias is the tendency of journals to preferentially publish studies with positive or statistically significant results, leading to an overrepresentation of such findings in the literature. The video points out that this bias can skew the perception of scientific consensus and the prevalence of true effects.

💡Replication Studies

Replication studies are experiments conducted to verify the results of previous research. The video discusses the challenges faced by researchers attempting to replicate studies, such as the difficulty in getting replication studies published and the potential for such studies to be rejected if they do not confirm the original findings.

💡5-sigma

5-sigma is a level of statistical significance used in particle physics, indicating a very low probability of a false positive. The video contrasts the high standards of particle physics with other fields, highlighting the discovery and subsequent debunking of the theta-plus pentaquark as an example of how even stringent standards can be met through p-hacking or other biases.

💡Peer Review

Peer review is the process by which researchers in a field evaluate the work of their colleagues to maintain scientific standards. The video touches on the limitations of peer review in preventing the publication of flawed research and the potential for bias in the publication process.

Highlights

In 2011, a study was published suggesting that people can see into the future, with a hit rate of 53% for erotic images, which was statistically significant with a p-value of .01.

The significance of p-values in determining whether a result is due to chance or a true effect, with a common threshold of .05 for publication.

The potential for a large portion of published research to be false, especially when considering the number of hypotheses tested and the statistical power of experiments.

The 'Why most published research is false' paper from 2005, highlighting the issues with the prevalence of false positives in scientific literature.

The Reproducibility Project's findings that only 36% of psychology studies could be statistically significantly replicated.

The challenge of replicating landmark cancer studies, with only 6 out of 53 studies successfully reproduced.

The phenomenon of 'p-hacking', where researchers manipulate data analysis to achieve statistically significant results.

The example of a study claiming that eating chocolate daily helps with weight loss, which was intentionally designed to increase the likelihood of false positives.

The issue of publication bias, where journals are more likely to publish studies with statistically significant results.

The incentives for scientists to publish novel and unexpected results, which can lead to an increase in tested hypotheses with a lower ratio of true relationships.

The difficulty of replicating studies and the reluctance of journals to publish replication studies, which hinders the self-correction of science.

The case of the pentaquark particle, where initial evidence was found but later studies could not confirm its existence, illustrating the problem of false discoveries in science.

The role of data interpretation in scientific research and how different research groups can draw different conclusions from the same data.

The steps being taken to address the reproducibility crisis in science, including large-scale replication studies and initiatives to publish null results.

The movement towards pre-registering hypotheses and methods for peer review before conducting experiments, aiming to reduce publication bias and p-hacking.

The reflection on the reliability of the scientific method despite its flaws, and the importance of using rigorous mathematical tools in the pursuit of truth.

The support for the video from Patreon and Audible.com, offering a free 30-day trial and highlighting the recommended book 'The Invention of Nature'.

Transcripts

Browse More Related Video

The Replication Crisis: Crash Course Statistics #31

The scandal that shook psychology to its core

Bias Detection (in Meta-Analyses)

Unit 1: Scientific Foundations of Psychology, AP Psych Exam Cram, Multiple Choice Practice Questions

05 - Using P-Values in Hypothesis Testing (Compare P Value to Level of Significance)

P-Hacking: Crash Course Statistics #30

Is Most Published Research Wrong?

Takeaways

Q & A

What was the title of the article published in the 'Journal of Personality and Social Psychology' in 2011?

What was the main claim of the 2011 study regarding the ability of people to see into the future?

What was the hit rate for participants when selecting the curtain with an image behind it, and what was considered significant?

What is a p-value and how is it used to assess the significance of study results?

What is the issue with using a p-value threshold of .05 for determining statistical significance?

What is 'p-hacking' and how does it increase the likelihood of false positives in research?

What was the result of the Reproducibility Project that attempted to replicate 100 psychology studies?

What is the '5-sigma' standard used in particle physics and why is it significant?

What is the impact of publication bias on the reproducibility of scientific findings?

What steps are being taken to address the reproducibility crisis in science?

Why is it important to consider the potential for error even when using the scientific method?