How to lie with statistics

Tableau
25 Oct 201847:19
EducationalLearning
32 Likes 10 Comments

TLDRDavid Sigerson, a solutions consultant with Tableau, delivers an insightful presentation on the nuances of statistical interpretation and the potential for manipulation in data presentation. The session, humorously titled 'How to Lie with Statistics,' aims to educate attendees on recognizing misleading data and the impact of statistics on decision-making. Sigerson uses real-world examples, including a British political campaign and a discussion on chocolate and weight loss, to illustrate common statistical fallacies. He emphasizes the importance of understanding the context and source of data, the influence of cognitive biases, and the need for critical thinking when interpreting statistical information. The presentation also touches on the limitations of AI and machine learning in handling biased or extrapolated data, advocating for a responsible approach to data analysis and sharing.

Takeaways
  • 🎨 **Art as a Metaphor**: Banksy's 'Peckham Rock' in the British Museum serves as a metaphor for how people often fail to question what's right in front of them, highlighting the importance of critical thinking when it comes to data and statistics.
  • 🧐 **Questioning Data**: We should always question the source and intention behind the data presented to us, considering why a particular statistic is being shared and whether it might be misleading or too good to be true.
  • 📈 **Impact of Visualization**: Visualizations can be powerful tools for understanding data, but they can also be misused to distort perceptions, as demonstrated by the examples of 3D pie charts and Apple's sales curve.
  • 🧠 **Thinking Systems**: Daniel Kahneman's system 1 (fast, intuitive thinking) and system 2 (slow, logical thinking) explain how our brains process information quickly but may not always engage in deeper, more accurate analysis, especially with statistics.
  • 🚫 **Avoiding Misleading Presentations**: Presenting data accurately and ethically is crucial. Techniques such as anchoring and framing can subconsciously influence perceptions and decisions, often leading to incorrect conclusions.
  • 📊 **Understanding Data Distributions**: Means and medians are essential statistical measures that can reveal different aspects of data distribution. Being aware of their differences helps in interpreting data correctly and avoiding skewed results.
  • 📉 **Recognizing Skewed Data**: Real-world data is often skewed, and it's important to recognize this to avoid misconceptions. For instance, self-assessments of driving ability often result in a skewed distribution where most people rate themselves above average.
  • 🧮 **Context Matters**: The context in which data is presented significantly influences perception. For example, the same statistical figures can appear more or less appealing depending on whether they are framed positively or negatively.
  • ⚖️ **The Dangers of Extrapolation**: Small sample sizes or short data sets should not be used to make broad predictions or assumptions about the future, as this can lead to inaccurate and potentially harmful outcomes.
  • 🔍 **Due Diligence in Data Collection**: It's important to be skeptical of data sources and to perform due diligence. This includes understanding potential biases, the representativeness of samples, and the possibility of confounding variables.
  • 🤔 **Engagement with Data**: Actively engaging with data, questioning its validity, and seeking to understand its context and collection methods is essential for making informed decisions and avoiding manipulation.
  • 🤝 **Collaboration and Peer Review**: The scientific method of forming hypotheses, collecting data, and then testing these hypotheses through peer review is a reliable way to ensure that statistical findings are robust and credible.
Q & A
  • What is the main theme of the presentation by David Sigerson?

    -The main theme of the presentation is to educate the audience on how statistics can be misleading or manipulated to 'lie', and to encourage a more critical and mindful approach to interpreting numerical data.

  • Why did Banksy's 'Peckham Rock' art piece in the British Museum go unnoticed for three days?

    -The 'Peckham Rock' art piece went unnoticed because people often fail to question the things around them, illustrating how easily people can overlook even significant details when they don't expect to find anything out of the ordinary.

  • What is the significance of the term 'modern caveman' in the context of the presentation?

    -The term 'modern caveman' is used to describe the part of our brain that makes quick, instinctive decisions without conscious thought, which can sometimes lead to incorrect assumptions or misinterpretations of statistical data.

  • What is the role of visual analytics in the interpretation of data?

    -Visual analytics leverages the human visual system's ability to quickly spot patterns using pre-attentive visual attributes like color, size, and shape. However, it can be misused by presenting data in a way that leads to incorrect inferences or hides certain aspects of the data.

  • Why is it important to consider the source and context of a statistic before accepting it as true?

    -Considering the source and context is crucial because it helps to determine the credibility and relevance of the statistic. Without this, one may fall prey to misleading or false information, as statistics can be manipulated or presented out of context to support a particular narrative.

  • What is the 'anchoring' bias and how does it influence decision-making?

    -The 'anchoring' bias refers to the human tendency to rely too heavily on the first piece of information encountered when making decisions. This can lead to incorrect assumptions and choices, as subsequent information may be adjusted to align with the initial, potentially misleading, anchor.

  • What does the presenter mean by 'framing' in the context of statistics?

    -Framing refers to the way a statistic or information is presented or contextualized. How data is framed can significantly influence perceptions and decisions, as people tend to respond more positively to information presented in a positive light and more negatively to the same information presented negatively.

  • Why is it difficult for people to accurately calculate the average increase in salary over two periods with a 100% increase followed by a 50% decrease?

    -This difficulty arises because people often rely on their intuitive, 'caveman' brain which is not adept at handling percentages and contextual changes. The correct calculation shows that the final salary is the same as the original after a 100% increase followed by a 50% decrease, which contradicts the intuitive belief that there would be an overall increase.

  • What is the significance of the phrase 'pig in a poke' in the context of the presentation?

    -The phrase 'pig in a poke' is used as a metaphor for being deceived by appearances or by accepting something without proper scrutiny. In the context of the presentation, it refers to the danger of accepting statistical information at face value without understanding its underlying truth or validity.

  • How does the presenter suggest improving the accuracy and reliability of data interpretation?

    -The presenter suggests a democratic approach to fact-checking, where data is shared widely, allowing many people to review and interpret it. This includes providing context, references, and encouraging collaboration and peer review to ensure the accuracy and reliability of data interpretation.

  • What role does Tableau play in the process of data analysis and interpretation as described by the presenter?

    -Tableau is presented as a tool that facilitates the sharing, collaboration, and critical examination of data. It allows users to define hypotheses, visualize data, and share findings with others for review and feedback, thus promoting a more thorough and reliable analysis.

Outlines
00:00
😀 Introduction to the Presentation on Deception in Statistics

David Sigerson, a solutions consultant with Tableau, introduces the presentation topic, 'How to Lie with Statistics.' He discusses the importance of questioning the information we encounter daily and the role of statistics in decision-making. David uses the example of Peckham rock by Banksy to illustrate how people often fail to question what's right in front of them. He also emphasizes the significance of recognizing misleading information and the impact of statistics on our perception.

05:02
😕 Trust and the Misuse of Statistics

The speaker delves into the concept of trust, particularly in professions such as politics, and how it relates to the acceptance of statistical information. He highlights a study ranking professions by trustworthiness and discusses the British public's perception of politicians, footballers, and nurses. David then transitions into the misuse of statistics, exemplified by the Brexit campaign's misleading claims about NHS funding. He also touches on how easily people can be misled by catchy headlines, such as those suggesting weight loss through chocolate consumption.

10:03
📊 The Power of Visual Analytics and Pre-attentive Attributes

David explores how visual analytics can be used to highlight patterns quickly through pre-attentive visual attributes like color, size, and shape. However, he warns of the potential for misuse, as these attributes can lead to incorrect inferences if data is presented in a manipulative way. He uses examples from car advertisements and Tim Cook's presentation at an Apple conference to illustrate how data can be visually distorted to tell a more compelling story that may not accurately represent the underlying numbers.

15:03
🧠 The Dual-Process Theory and the Role of System 1 and System 2 Thinking

The speaker explains the dual-process theory proposed by Daniel Kahneman, which differentiates between System 1 (fast, intuitive thinking) and System 2 (slow, logical thinking). He uses the example of Linda, a fictional character, to demonstrate how people often rely on System 1 thinking, leading to potentially incorrect judgments. David emphasizes the importance of context in decision-making and the influence of biases on our thought processes.

20:05
💰 The Impact of Anchoring and Framing on Decision-Making

David discusses two cognitive biases: anchoring and framing. Anchoring refers to the tendency to rely too heavily on the first piece of information encountered when making decisions. He uses the example of a shopping experience to illustrate how initial high prices can influence how much one is willing to spend. Framing is shown through how information is presented—either positively or negatively—which can significantly affect perceptions and choices. The speaker also touches on the public's misunderstanding of statistical terms like 'means' and 'medians' and how real-world data often doesn't follow a normal distribution.

25:06
📈 The Importance of Understanding Data Distribution and Percentiles

The speaker addresses the importance of comprehending data distribution and the potential pitfalls of relying on mean and median values. He uses the example of adding extremely wealthy individuals to a group to show how the mean can be skewed, while the median remains a more stable measure of central tendency. David also explains the concept of compound interest and how it's often misunderstood, leading to financial decisions that aren't in the best interest of the individual.

30:06
🕵️‍♂️ Being Vigilant Against Data Manipulation

David warns about the potential for data manipulation, using the example of a politician's misleading tweet about salary increases. He emphasizes the need for critical thinking when presented with statistical information. The speaker also discusses the challenges of data collection, such as ensuring the reliability of survey responses and being aware of the limitations and biases in data sources. He concludes by stressing the importance of always seeking to verify the accuracy and source of statistical claims.

35:07
🔍 The Role of Data Analysis Tools in Promoting Transparency and Collaboration

The speaker highlights the role of data analysis tools like Tableau in fostering a transparent and collaborative approach to data analysis. He discusses how such tools can help in formulating hypotheses, testing them rigorously, and sharing findings for peer review. David also emphasizes the importance of making data accessible for broader review within an organization, allowing more people to engage with the data and contribute to a more accurate understanding of it.

40:08
🌟 Final Thoughts on Engaging with Statistics

In his closing remarks, David encourages the audience to take an active role in critically evaluating statistical information and to be aware of potential manipulation. He advises providing context and references when sharing data and promoting a culture of fact-checking and transparency. The speaker leaves the audience with a final thought on the importance of staying alert for statistical manipulation and the role of individuals in ensuring the integrity of data interpretation.

Mindmap
Keywords
💡Statistics
Statistics refers to the collection, analysis, interpretation, presentation, and organization of data. In the context of the video, it emphasizes how statistics can be manipulated to deceive or mislead, which is the central theme of the presentation 'How to Lie with Statistics.' The speaker, David Sigerson, uses various examples to illustrate the misuse of statistics in everyday life and decision-making.
💡Data Visualization
Data visualization is the graphical representation of information and data. It is a crucial aspect discussed in the video, as it can be used effectively to reveal patterns in data or be manipulated to mislead viewers. An example given is a 3D pie chart used in an advertisement that distorts the true market share by hiding slices behind others, thus misleading the audience about the actual data.
💡Bias
Bias in statistics refers to a systematic distortion or misrepresentation of information. The video discusses different types of biases, such as anchoring bias and framing, which affect how people perceive and interpret data. For instance, the concept of framing is explained, where the same statistic is presented in a positive or negative light to influence the audience's perception.
💡Conditional Probability
Conditional probability is a concept in statistics that deals with the probability of an event given the occurrence of a related event. It is mentioned in the context of a psychological experiment involving Linda, where the audience is shown to be poor at assessing conditional probabilities, thus highlighting the limitations of human intuition in statistical reasoning.
💡System 1 and System 2 Thinking
This concept, from Daniel Kahneman's work, differentiates between two modes of thought in the human brain. System 1 is fast, intuitive, and operates unconsciously, while System 2 is slow, logical, and conscious. In the video, it is used to explain why people often fall for statistical manipulations—System 1 quickly makes associations and decisions without delving into the complexities that System 2 might consider.
💡Means and Medians
Means and medians are measures of central tendency in statistics. The mean is the average of a data set, while the median is the middle value when the data points are arranged in order. The video points out that the mean can be skewed by outliers, such as the example of Bill Gates entering a room of average earners, which significantly raises the mean but not the median salary.
💡Extrapolation
Extrapolation is the process of estimating or predicting data points outside the range of known data. The video warns against the dangers of extrapolation, as it can lead to false assumptions and conclusions. An example is given where a small data set is used to make broad predictions about future sales, which can be misleading.
💡Correlation and Causation
Correlation refers to a statistical relationship between two variables, while causation implies a direct cause-and-effect relationship. The video emphasizes that just because two variables are correlated does not mean one causes the other, a common misconception that can lead to flawed reasoning and decision-making.
💡Data Collection
Data collection is the process of gathering and measuring data in a methodical way. The video discusses the importance of careful data collection and the pitfalls of using poor or biased data sources. It also touches on the concept of survey bias and leading questions, which can skew the results of data collection.
💡Pig in a Poke
The phrase 'pig in a poke' is an old English saying used in the video to illustrate the idea of being sold something that appears good but turns out to be something else entirely. It is used as a metaphor for the deceptive use of statistics, where data might appear reliable at first glance, but upon closer inspection, it is misleading or incorrect.
💡Democratization of Data
Democratization of data refers to making data accessible and understandable to a wider range of people within an organization. The video advocates for this approach to ensure that more individuals can analyze and verify the accuracy of data, thus preventing manipulation and promoting transparency in data-driven decision-making.
Highlights

David Sigerson discusses the importance of questioning the data we encounter daily and its potential to be misleading.

The presentation is inspired by the book 'How to Lie with Statistics' and updated with contemporary examples.

Banksy's 'Peckham Rock' art piece in the British Museum is used as an analogy for how people often fail to recognize deception.

The concept of 'modern caveman' is introduced to illustrate how our primal instincts can lead us to make quick, potentially incorrect judgments.

Sigerson emphasizes the role of trust in professions and how it can influence our acceptance of statistical information.

The misuse of a statistic during the Brexit campaign is highlighted to show the long-term impact of misleading data.

The talk explores how visual analytics can be manipulated to present data in a more favorable, yet potentially deceptive, light.

Examples of how companies like Apple use visual representation to their advantage, sometimes at the cost of data accuracy.

The unconscious 'System 1' thinking is explained as a factor in how people quickly make decisions based on data presented to them.

The difference between means and medians is discussed as a way to either reveal or hide income inequality.

The concept of 'anchoring' is introduced to explain how initial figures can influence our perception and decision-making.

Framing effects are shown to have a significant impact on how we perceive statistics, with examples of positive and negative frames.

The importance of understanding the context of data and how our brains often fail to grasp the complexities of percentages and compound interest.

The 'pig in a poke' analogy is used to caution against accepting data or deals that seem too good to be true without proper scrutiny.

The challenges of data collection are discussed, including the potential for bias, lack of clarity, and leading questions in surveys.

The role of correlations in data analysis is examined, with a warning against assuming causation from correlated data.

The potential pitfalls of AI and machine learning are highlighted, particularly the issues of extrapolation and bias.

The necessity of the scientific method in data analysis is emphasized, including hypothesis testing and peer review.

Tableau's role in data analysis is presented as a tool for hypothesis testing, data sharing, and collaboration to ensure data accuracy.

Sigerson concludes with a call to action for individuals to engage critically with data and to be vigilant against potential manipulation.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: