How to lie with statistics
TLDRDavid Sigerson, a solutions consultant with Tableau, delivers an insightful presentation on the nuances of statistical interpretation and the potential for manipulation in data presentation. The session, humorously titled 'How to Lie with Statistics,' aims to educate attendees on recognizing misleading data and the impact of statistics on decision-making. Sigerson uses real-world examples, including a British political campaign and a discussion on chocolate and weight loss, to illustrate common statistical fallacies. He emphasizes the importance of understanding the context and source of data, the influence of cognitive biases, and the need for critical thinking when interpreting statistical information. The presentation also touches on the limitations of AI and machine learning in handling biased or extrapolated data, advocating for a responsible approach to data analysis and sharing.
Takeaways
- ๐จ **Art as a Metaphor**: Banksy's 'Peckham Rock' in the British Museum serves as a metaphor for how people often fail to question what's right in front of them, highlighting the importance of critical thinking when it comes to data and statistics.
- ๐ง **Questioning Data**: We should always question the source and intention behind the data presented to us, considering why a particular statistic is being shared and whether it might be misleading or too good to be true.
- ๐ **Impact of Visualization**: Visualizations can be powerful tools for understanding data, but they can also be misused to distort perceptions, as demonstrated by the examples of 3D pie charts and Apple's sales curve.
- ๐ง **Thinking Systems**: Daniel Kahneman's system 1 (fast, intuitive thinking) and system 2 (slow, logical thinking) explain how our brains process information quickly but may not always engage in deeper, more accurate analysis, especially with statistics.
- ๐ซ **Avoiding Misleading Presentations**: Presenting data accurately and ethically is crucial. Techniques such as anchoring and framing can subconsciously influence perceptions and decisions, often leading to incorrect conclusions.
- ๐ **Understanding Data Distributions**: Means and medians are essential statistical measures that can reveal different aspects of data distribution. Being aware of their differences helps in interpreting data correctly and avoiding skewed results.
- ๐ **Recognizing Skewed Data**: Real-world data is often skewed, and it's important to recognize this to avoid misconceptions. For instance, self-assessments of driving ability often result in a skewed distribution where most people rate themselves above average.
- ๐งฎ **Context Matters**: The context in which data is presented significantly influences perception. For example, the same statistical figures can appear more or less appealing depending on whether they are framed positively or negatively.
- โ๏ธ **The Dangers of Extrapolation**: Small sample sizes or short data sets should not be used to make broad predictions or assumptions about the future, as this can lead to inaccurate and potentially harmful outcomes.
- ๐ **Due Diligence in Data Collection**: It's important to be skeptical of data sources and to perform due diligence. This includes understanding potential biases, the representativeness of samples, and the possibility of confounding variables.
- ๐ค **Engagement with Data**: Actively engaging with data, questioning its validity, and seeking to understand its context and collection methods is essential for making informed decisions and avoiding manipulation.
- ๐ค **Collaboration and Peer Review**: The scientific method of forming hypotheses, collecting data, and then testing these hypotheses through peer review is a reliable way to ensure that statistical findings are robust and credible.
Q & A
What is the main theme of the presentation by David Sigerson?
-The main theme of the presentation is to educate the audience on how statistics can be misleading or manipulated to 'lie', and to encourage a more critical and mindful approach to interpreting numerical data.
Why did Banksy's 'Peckham Rock' art piece in the British Museum go unnoticed for three days?
-The 'Peckham Rock' art piece went unnoticed because people often fail to question the things around them, illustrating how easily people can overlook even significant details when they don't expect to find anything out of the ordinary.
What is the significance of the term 'modern caveman' in the context of the presentation?
-The term 'modern caveman' is used to describe the part of our brain that makes quick, instinctive decisions without conscious thought, which can sometimes lead to incorrect assumptions or misinterpretations of statistical data.
What is the role of visual analytics in the interpretation of data?
-Visual analytics leverages the human visual system's ability to quickly spot patterns using pre-attentive visual attributes like color, size, and shape. However, it can be misused by presenting data in a way that leads to incorrect inferences or hides certain aspects of the data.
Why is it important to consider the source and context of a statistic before accepting it as true?
-Considering the source and context is crucial because it helps to determine the credibility and relevance of the statistic. Without this, one may fall prey to misleading or false information, as statistics can be manipulated or presented out of context to support a particular narrative.
What is the 'anchoring' bias and how does it influence decision-making?
-The 'anchoring' bias refers to the human tendency to rely too heavily on the first piece of information encountered when making decisions. This can lead to incorrect assumptions and choices, as subsequent information may be adjusted to align with the initial, potentially misleading, anchor.
What does the presenter mean by 'framing' in the context of statistics?
-Framing refers to the way a statistic or information is presented or contextualized. How data is framed can significantly influence perceptions and decisions, as people tend to respond more positively to information presented in a positive light and more negatively to the same information presented negatively.
Why is it difficult for people to accurately calculate the average increase in salary over two periods with a 100% increase followed by a 50% decrease?
-This difficulty arises because people often rely on their intuitive, 'caveman' brain which is not adept at handling percentages and contextual changes. The correct calculation shows that the final salary is the same as the original after a 100% increase followed by a 50% decrease, which contradicts the intuitive belief that there would be an overall increase.
What is the significance of the phrase 'pig in a poke' in the context of the presentation?
-The phrase 'pig in a poke' is used as a metaphor for being deceived by appearances or by accepting something without proper scrutiny. In the context of the presentation, it refers to the danger of accepting statistical information at face value without understanding its underlying truth or validity.
How does the presenter suggest improving the accuracy and reliability of data interpretation?
-The presenter suggests a democratic approach to fact-checking, where data is shared widely, allowing many people to review and interpret it. This includes providing context, references, and encouraging collaboration and peer review to ensure the accuracy and reliability of data interpretation.
What role does Tableau play in the process of data analysis and interpretation as described by the presenter?
-Tableau is presented as a tool that facilitates the sharing, collaboration, and critical examination of data. It allows users to define hypotheses, visualize data, and share findings with others for review and feedback, thus promoting a more thorough and reliable analysis.
Outlines
๐ Introduction to the Presentation on Deception in Statistics
David Sigerson, a solutions consultant with Tableau, introduces the presentation topic, 'How to Lie with Statistics.' He discusses the importance of questioning the information we encounter daily and the role of statistics in decision-making. David uses the example of Peckham rock by Banksy to illustrate how people often fail to question what's right in front of them. He also emphasizes the significance of recognizing misleading information and the impact of statistics on our perception.
๐ Trust and the Misuse of Statistics
The speaker delves into the concept of trust, particularly in professions such as politics, and how it relates to the acceptance of statistical information. He highlights a study ranking professions by trustworthiness and discusses the British public's perception of politicians, footballers, and nurses. David then transitions into the misuse of statistics, exemplified by the Brexit campaign's misleading claims about NHS funding. He also touches on how easily people can be misled by catchy headlines, such as those suggesting weight loss through chocolate consumption.
๐ The Power of Visual Analytics and Pre-attentive Attributes
David explores how visual analytics can be used to highlight patterns quickly through pre-attentive visual attributes like color, size, and shape. However, he warns of the potential for misuse, as these attributes can lead to incorrect inferences if data is presented in a manipulative way. He uses examples from car advertisements and Tim Cook's presentation at an Apple conference to illustrate how data can be visually distorted to tell a more compelling story that may not accurately represent the underlying numbers.
๐ง The Dual-Process Theory and the Role of System 1 and System 2 Thinking
The speaker explains the dual-process theory proposed by Daniel Kahneman, which differentiates between System 1 (fast, intuitive thinking) and System 2 (slow, logical thinking). He uses the example of Linda, a fictional character, to demonstrate how people often rely on System 1 thinking, leading to potentially incorrect judgments. David emphasizes the importance of context in decision-making and the influence of biases on our thought processes.
๐ฐ The Impact of Anchoring and Framing on Decision-Making
David discusses two cognitive biases: anchoring and framing. Anchoring refers to the tendency to rely too heavily on the first piece of information encountered when making decisions. He uses the example of a shopping experience to illustrate how initial high prices can influence how much one is willing to spend. Framing is shown through how information is presentedโeither positively or negativelyโwhich can significantly affect perceptions and choices. The speaker also touches on the public's misunderstanding of statistical terms like 'means' and 'medians' and how real-world data often doesn't follow a normal distribution.
๐ The Importance of Understanding Data Distribution and Percentiles
The speaker addresses the importance of comprehending data distribution and the potential pitfalls of relying on mean and median values. He uses the example of adding extremely wealthy individuals to a group to show how the mean can be skewed, while the median remains a more stable measure of central tendency. David also explains the concept of compound interest and how it's often misunderstood, leading to financial decisions that aren't in the best interest of the individual.
๐ต๏ธโโ๏ธ Being Vigilant Against Data Manipulation
David warns about the potential for data manipulation, using the example of a politician's misleading tweet about salary increases. He emphasizes the need for critical thinking when presented with statistical information. The speaker also discusses the challenges of data collection, such as ensuring the reliability of survey responses and being aware of the limitations and biases in data sources. He concludes by stressing the importance of always seeking to verify the accuracy and source of statistical claims.
๐ The Role of Data Analysis Tools in Promoting Transparency and Collaboration
The speaker highlights the role of data analysis tools like Tableau in fostering a transparent and collaborative approach to data analysis. He discusses how such tools can help in formulating hypotheses, testing them rigorously, and sharing findings for peer review. David also emphasizes the importance of making data accessible for broader review within an organization, allowing more people to engage with the data and contribute to a more accurate understanding of it.
๐ Final Thoughts on Engaging with Statistics
In his closing remarks, David encourages the audience to take an active role in critically evaluating statistical information and to be aware of potential manipulation. He advises providing context and references when sharing data and promoting a culture of fact-checking and transparency. The speaker leaves the audience with a final thought on the importance of staying alert for statistical manipulation and the role of individuals in ensuring the integrity of data interpretation.
Mindmap
Keywords
๐กStatistics
๐กData Visualization
๐กBias
๐กConditional Probability
๐กSystem 1 and System 2 Thinking
๐กMeans and Medians
๐กExtrapolation
๐กCorrelation and Causation
๐กData Collection
๐กPig in a Poke
๐กDemocratization of Data
Highlights
David Sigerson discusses the importance of questioning the data we encounter daily and its potential to be misleading.
The presentation is inspired by the book 'How to Lie with Statistics' and updated with contemporary examples.
Banksy's 'Peckham Rock' art piece in the British Museum is used as an analogy for how people often fail to recognize deception.
The concept of 'modern caveman' is introduced to illustrate how our primal instincts can lead us to make quick, potentially incorrect judgments.
Sigerson emphasizes the role of trust in professions and how it can influence our acceptance of statistical information.
The misuse of a statistic during the Brexit campaign is highlighted to show the long-term impact of misleading data.
The talk explores how visual analytics can be manipulated to present data in a more favorable, yet potentially deceptive, light.
Examples of how companies like Apple use visual representation to their advantage, sometimes at the cost of data accuracy.
The unconscious 'System 1' thinking is explained as a factor in how people quickly make decisions based on data presented to them.
The difference between means and medians is discussed as a way to either reveal or hide income inequality.
The concept of 'anchoring' is introduced to explain how initial figures can influence our perception and decision-making.
Framing effects are shown to have a significant impact on how we perceive statistics, with examples of positive and negative frames.
The importance of understanding the context of data and how our brains often fail to grasp the complexities of percentages and compound interest.
The 'pig in a poke' analogy is used to caution against accepting data or deals that seem too good to be true without proper scrutiny.
The challenges of data collection are discussed, including the potential for bias, lack of clarity, and leading questions in surveys.
The role of correlations in data analysis is examined, with a warning against assuming causation from correlated data.
The potential pitfalls of AI and machine learning are highlighted, particularly the issues of extrapolation and bias.
The necessity of the scientific method in data analysis is emphasized, including hypothesis testing and peer review.
Tableau's role in data analysis is presented as a tool for hypothesis testing, data sharing, and collaboration to ensure data accuracy.
Sigerson concludes with a call to action for individuals to engage critically with data and to be vigilant against potential manipulation.
Transcripts
Browse More Related Video
Marco Bonzanini - Lies, damned lies, and statistics
Chapter 1 - An Intro to Business Statistics
1. Introduction to Statistics
How to defend yourself against misleading statistics in the news | Sanne Blauw | TEDxMaastricht
Descriptive Statistics | What is Descriptive Statistics ? | Mean, Median & Mode | Great Learning
Don't Be Fooled By Bad Statistics
5.0 / 5 (0 votes)
Thanks for rating: