How Not to Fall for Bad Statistics - with Jennifer Rogers

The Royal Institution
7 Aug 201942:20
EducationalLearning
32 Likes 10 Comments

TLDRIn this talk, the speaker discusses how to interpret risk in everyday life, debunking common misconceptions and providing a toolbox for understanding headlines about health risks. They explore the difference between relative and absolute risks, the importance of considering confounding factors, and the misuse of statistics in advertising. The speaker uses humor and real-life examples to make complex statistical concepts accessible and engaging.

Takeaways
  • πŸ“° The media often presents risk information in a way that can be misleading, focusing on relative risks rather than absolute risks, which can distort the true level of danger.
  • πŸ¦„ When evaluating risk, it's crucial to consider the actual numbers and the context in which they are presented, rather than just the sensational headlines.
  • 🐊 In understanding risk, public perception can be vastly different from the actual data. For example, more people die from hippos than crocodiles, contrary to what many might believe.
  • πŸš΄β€β™€οΈ The perceived risk of activities can be skewed by how they are measured. For instance, cycling is considered more dangerous than driving a car when measured by deaths per billion hours, but this doesn't account for other factors like distance traveled.
  • πŸ₯“ The impact of certain foods, like bacon, on health risks is often overstated in the media. It's important to look at the actual increase in risk (absolute risk) rather than just the percentage increase (relative risk).
  • πŸš— The effectiveness of interventions like speed cameras can be misinterpreted due to regression to the mean, where improvements are attributed to the intervention rather than natural fluctuations in data.
  • πŸ” When analyzing data, correlation does not necessarily imply causation. It's essential to consider other factors that might explain the observed relationship.
  • πŸ§€ Spurious correlations can be found in any dataset, and they can be humorous or misleading. It's important to question whether there is a true causal relationship or if other factors are at play.
  • πŸ† The concept of 'regression to the mean' is relevant in many areas, including sports, where temporary high or low performance can be misattributed to external factors like management changes.
  • πŸ™οΈ In comparing cities or regions for safety, such as London and New York, short-term statistics can be misleading. Long-term trends provide a more accurate picture of relative safety.
  • πŸ“Š Graphical representations of data can be manipulated to tell a specific story, sometimes at the expense of accuracy. It's important to scrutinize the scale and the actual numbers presented in graphs.
Q & A
  • What is the main topic of the talk?

    -The main topic of the talk is understanding risk and how to make sense of risk-related headlines.

  • Why are headlines about risk important for our daily lives?

    -Headlines about risk are important because they are supposed to inform our day-to-day decisions and help us live longer, healthier lives.

  • What is the first risk-related question the speaker asks the audience?

    -The first risk-related question the speaker asks is which animal is more dangerous: crocodiles or hippos.

  • According to the World Health Organisation, which animal causes more deaths, crocodiles or hippos?

    -According to the World Health Organisation, crocodiles cause more deaths than hippos, with 1,000 deaths a year compared to 500 from hippos.

  • What is the difference between relative risk and absolute risk?

    -Relative risk tells you the risk in one group compared to another, while absolute risk gives you the actual probability of an event occurring.

  • What is the example used in the talk to explain the difference between relative and absolute risk?

    -The example used is the risk of pancreatic cancer from eating bacon daily, which increases the relative risk by 20%, but in absolute terms, it means an increase from 5 to 6 cases in every 400 individuals.

  • What is the issue with comparing bacon and smoking as cancer risks based on statistical significance alone?

    -Comparing bacon and smoking based on statistical significance alone is misleading because it ignores the actual magnitude of the risks, which are vastly different.

  • What is the concept of 'regression to the mean' and how is it demonstrated in the talk?

    -Regression to the mean is the tendency for extreme results to be followed by more average results. It is demonstrated through a dice-rolling exercise where 'accidents' decrease after 'speed cameras' are placed, but this decrease is due to chance rather than the cameras.

  • Why did the speaker criticize the BBC's coverage of the story about living near a busy road and the risk of dementia?

    -The speaker criticized the BBC's coverage because it focused on a 7% increased risk of dementia from living near a busy road, while ignoring other factors like smoking and obesity, which have a much greater impact on dementia risk.

  • What is the issue with small sample sizes in surveys and how can it affect the reliability of the results?

    -Small sample sizes can lead to high uncertainty and unreliable results because they may not accurately represent the larger population. The confidence interval for the results may be wide, making it difficult to draw definitive conclusions.

  • How can graphics in advertisements or media sometimes mislead viewers about statistical data?

    -Graphics can mislead by using incorrect scales, presenting data in a misleading way, or not clearly showing the uncertainty in the data. This can result in viewers having a distorted understanding of the actual statistics.

  • What is the speaker's role in the Royal Statistical Society and what are they working on?

    -The speaker is a member of the Royal Statistical Society and is involved in a project aimed at improving data ethics in advertising.

Outlines
00:00
πŸ“° Understanding Risk in Daily Headlines

The speaker begins by discussing the prevalence of risk-related headlines in the media and their impact on public perception. They emphasize the importance of understanding how to interpret these risks, particularly the difference between relative and absolute risks. The speaker introduces the concept of a 'toolbox' for evaluating risk-related information and engages the audience with a survey on risky activities, highlighting common misconceptions about the dangers of animals, sports, and transportation modes.

05:01
πŸ– The Risky Business of Bacon

This paragraph delves into the controversy surrounding bacon and its alleged link to cancer. The speaker critiques the media's portrayal of bacon as a significant cancer risk, explaining the difference between relative and absolute risk. They use the example of pancreatic cancer to illustrate how a 20% increased risk in relative terms translates to a much smaller increase in absolute terms. The speaker also addresses the World Health Organization's classification of processed meats as carcinogenic, comparing it to the risk of smoking, and emphasizes the need for a more nuanced understanding of risk.

10:03
πŸš— Risk Perception and Measurement

The speaker continues the discussion on risk by examining how it is measured and perceived. They challenge the audience to think critically about the methods used to assess risk, using examples of cycling versus driving and the variability in risk assessment methods. The speaker also touches on the complexities of measuring risk in activities like flying, where the risk is not constant throughout the journey. This section underscores the importance of considering the methodology behind risk assessments.

15:08
🧠 Dementia Risk and Living Near Busy Roads

In this paragraph, the speaker critiques a study that linked living near a busy road to an increased risk of dementia. They highlight the importance of considering confounding factors and the need for a comprehensive analysis of risk factors. The speaker points out that the study did not control for family history, which could significantly influence both the risk of dementia and the likelihood of living near a busy road. They also discuss the media's selective reporting of risk factors, urging the audience to question what information is being omitted.

20:08
🎲 Regression to the Mean in Accident Statistics

The speaker introduces the concept of regression to the mean using a dice-rolling demonstration. They explain how placing speed cameras in areas with high accident rates might lead to a reduction in accidents, which could be mistakenly attributed to the effectiveness of the cameras. The speaker argues that this reduction could simply be a random fluctuation, illustrating the need for a more thorough analysis over time to determine the true impact of interventions like speed cameras.

25:09
✈️ Air Travel Safety and Statistical Fluctuations

This paragraph discusses the perception of air travel safety, particularly in light of a year with no passenger jet crashes. The speaker argues that such a statistic could be misleading and that fluctuations in crash rates are expected. They caution against overreacting to single-year data and emphasize the need for a broader perspective when evaluating safety trends. The speaker also addresses the comparison of murder rates in London and New York, highlighting the importance of considering longer-term data.

30:11
πŸ”’ The Misuse of Statistics in Advertising

The speaker critiques the use of statistics in advertising, particularly in the context of a toothpaste advertisement. They explain the concept of uncertainty in statistical results and the importance of understanding the difference between probability theory and statistical inference. The speaker emphasizes the need for a clear understanding of what is being measured and the potential for variability in results, urging consumers to be skeptical of statistical claims in ads.

35:12
πŸ† Sports Statistics and the Illusion of Causation

In this paragraph, the speaker discusses the misuse of statistics in sports, particularly in the context of managerial changes and their perceived impact on team performance. They explain how regression to the mean can explain seemingly sudden improvements in performance, which may be attributed to new managers but are actually part of the team's natural fluctuation in performance. The speaker also addresses the 'Sports Illustrated curse,' illustrating how exceptional performance can be followed by a return to average performance.

40:13
πŸ† Misleading Graphics in Statistical Presentation

The speaker concludes by highlighting the importance of accurately presenting statistical information, particularly in graphics. They critique several examples of misleading or incorrect graphics, emphasizing the need for clarity and accuracy in data presentation. The speaker encourages the audience to be vigilant in interpreting statistical graphics, considering the scale, the source of the data, and the potential for misrepresentation.

Mindmap
Keywords
πŸ’‘Risk
Risk is the possibility of harm or loss. In the context of the video, it is central to understanding how we perceive and react to potential dangers in our daily lives. The speaker discusses how headlines and statistics can influence our perception of risk, such as the risk of cancer from eating bacon or the risk of accidents in different sports.
πŸ’‘Relative Risk
Relative risk is a measure of how much more likely an event is to occur in one group compared to another. The video emphasizes the importance of understanding relative risks in health-related headlines, such as the increased risk of pancreatic cancer from eating bacon. It's crucial because it only shows the change in risk compared to a baseline, not the actual risk levels.
πŸ’‘Absolute Risk
Absolute risk is the actual probability of an event occurring. The speaker uses the example of pancreatic cancer to explain how to convert relative risk into absolute risk, highlighting the need to know the baseline risk before interpreting increased risks from certain behaviors like eating bacon.
πŸ’‘Correlation vs. Causation
Correlation is a measure that expresses the extent to which two variables are linearly related, while causation implies a direct cause-and-effect relationship. The video uses humorous examples, like the correlation between fizzy drink consumption and teenage violence, to illustrate the danger of assuming causation from mere correlation.
πŸ’‘Confounding Factors
Confounding factors are variables that can cause or influence the relationship between the variables being studied. The speaker mentions how factors like family history can confound the relationship between living near a busy road and the risk of dementia, emphasizing the need to control for these factors in statistical analysis.
πŸ’‘Regression to the Mean
Regression to the mean is the phenomenon where extreme values in a dataset tend to be closer to the average in subsequent measurements. The video uses the example of speed cameras and their placement based on accident hotspots to demonstrate how this statistical concept can mislead interpretations of cause and effect.
πŸ’‘Statistical Significance
Statistical significance is a measure that indicates whether the results of a study are likely to be true in the general population. The speaker critiques the World Health Organization's classification of processed meat as a carcinogen, arguing that statistical significance does not necessarily imply a high risk.
πŸ’‘Uncertainty
Uncertainty in statistics refers to the inherent variability in data that can affect the reliability of results. The video discusses how uncertainty impacts the interpretation of survey results and the need for larger sample sizes to reduce this uncertainty and provide more reliable conclusions.
πŸ’‘Confidence Interval
A confidence interval is a range of values that is likely to contain the true value of a parameter based on a statistical model. The speaker explains how confidence intervals can help determine whether a result is statistically significant by comparing it to the null hypothesis.
πŸ’‘Data Ethics
Data ethics involves the responsible and fair handling of data, particularly in advertising and public communication. The video touches on the Royal Statistical Society's efforts to improve data ethics, highlighting the importance of transparency and accuracy in presenting statistical information.
πŸ’‘Hypothesis Testing
Hypothesis testing is a statistical method used to determine if there is enough evidence to support a claim. The video explains how to use hypothesis testing to evaluate whether observed results, like a high percentage of agreement in a survey, are statistically significant and not just due to chance.
Highlights

The speaker discusses the importance of understanding risk in everyday life and how to interpret risk-related headlines.

A tool box of questions to consider when evaluating risk headlines is introduced.

The audience is engaged in a survey to assess their understanding of risk, involving dangerous animals, sports, and transport modes.

Contrary to popular belief, hippos are more deadly than crocodiles, and cheerleading causes more accidents than baseball.

The distinction between relative and absolute risks is explained using the example of bacon consumption and pancreatic cancer risk.

A comparison of the risk of lung cancer from smoking versus pancreatic cancer from bacon shows a significant difference.

The concept of statistical significance is explained and its limitations in quantifying risk are highlighted.

The importance of considering daily habits and lifestyle factors when evaluating health risks is discussed.

The difference between correlation and causation is emphasized with humorous examples, such as fizzy drinks and teenage violence.

The use of confounding factors in statistical analysis is introduced to explain spurious correlations.

The concept of regression to the mean is demonstrated with a dice-rolling experiment to illustrate random fluctuations.

The impact of regression to the mean on the interpretation of speed camera effectiveness is discussed.

The speaker shares personal experiences of challenging misleading statistics in advertising with Ryanair.

The misuse of statistics in a toothpaste advertisement is critiqued, highlighting the issue of small sample sizes.

The importance of considering effect size alongside sample size in statistical analysis is explained.

The speaker concludes with advice on interpreting statistics in the media, emphasizing the need for critical thinking.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: