How to Lie with Baseball Stats | Baseball Bits

Foolish Baseball
25 Feb 202317:40
EducationalLearning
32 Likes 10 Comments

TLDRThe video script discusses the potential for misleading information in baseball statistics. It highlights how various statistical tricks, such as manipulating y-axis scales or using technical truths that are misleading, can create false impressions about players' performances. The speaker emphasizes the importance of context, including era adjustments, park factors, and defensive metrics, to accurately interpret and compare players across different time periods. The narrative also touches on the evolution of baseball analytics and the risks of applying modern metrics to historical players, cautioning against anachronistic evaluations.

Takeaways
  • πŸ“Š Manipulating data presentation, such as not starting the y-axis at zero or using double y-axis, can create misleading impressions in statistics.
  • πŸ’‘ The key to 'lying' with statistics is to use technical truths - stats that are true but misleading when taken out of context.
  • ⚾️ Comparing era (earned run average) across different years can be deceiving without considering the overall run environment and league averages.
  • πŸ† Plus stats like era plus and ops plus provide a more accurate comparison by relating individual performance to the league average.
  • πŸ“ˆ Player performance can be significantly influenced by the ballpark they play in, and changes to the ballpark can alter a player's stats.
  • πŸ₯Š Small sample sizes can lead to misleading conclusions; a larger sample size is needed for more reliable statistics.
  • πŸ”„ Changes in league trends, such as strikeout rates, can impact player stats and should be considered when evaluating performance.
  • πŸ… Defensive metrics can be subjective and influenced by human discretion; advanced metrics provide a more accurate assessment of a player's defensive abilities.
  • 🌟 A player's total career and peak performance should be considered rather than just rate stats like batting average or on-base percentage.
  • πŸ“‰ Integrating new stats and analytics into historical contexts can lead to misinterpretations of a player's true value and abilities.
  • πŸ”„ The relative strength of the league and the talent pool can fluctuate over time, affecting the dominance and perception of players from different eras.
Q & A
  • What is the main message of the transcript about baseball statistics?

    -The main message is that while statistics can be manipulated to tell a false narrative, the true understanding of a player's performance comes from using the right metrics and considering the context in which they played.

  • How does the speaker use the concept of 'technical truths' in baseball statistics?

    -The speaker uses 'technical truths' to refer to stats that are factually correct but can be misleading when taken out of context or used to support a false narrative. They emphasize that the key to not lying is to use these stats appropriately and in context.

  • What is the significance of the year 1968 for the Boston Red Sox pitching staff?

    -In 1968, the Boston Red Sox pitching staff had a team earned run average (ERA) of 3.33, which is presented as better than their 1999 ERA of 4. However, the speaker points out that this comparison is misleading due to changes in the game over time, such as the height of the mound and the use of a livelier ball.

  • What is 'ERA+' and how does it help in comparing pitchers across different eras?

    -ERA+ is a stat that compares a pitcher's earned run average (ERA) to the league average for any given year. It is adjusted so that 100 is average, with higher values indicating better performance. This allows for a more accurate comparison of pitchers across different eras, as it accounts for changes in the run environment.

  • How does the speaker address the issue of sample size in baseball statistics?

    -The speaker emphasizes that small sample sizes can lead to misleading conclusions. They argue for using larger sample sizes to get a more accurate understanding of a player's true abilities and to avoid drawing false conclusions from short-term performance.

  • What is the importance of considering a player's total career and peak when evaluating their performance?

    -Evaluating a player's total career and peak provides a more comprehensive understanding of their performance over time. It helps to avoid misleading comparisons based on selective use of data from specific periods of a player's career.

  • How does the speaker discuss the limitations of traditional defensive metrics like fielding percentage?

    -The speaker points out that traditional defensive metrics like fielding percentage do not account for the difficulty of the plays or the overall defensive environment. They advocate for the use of advanced defensive metrics, such as Total Zone Defensive Run Saved and Statcast Outs Above Average, which provide a more accurate picture of a player's defensive contributions.

  • What is the significance of the 'defensive spectrum' in evaluating defensive players?

    -The defensive spectrum is a concept that attempts to adjust for the relative difficulty of different positions. It helps to evaluate the defensive value of players at different positions more fairly, recognizing that playing a position like shortstop well is more valuable than playing first base well, even if the raw defensive stats are not as high.

  • How does the speaker illustrate the danger of applying modern sabermetric understandings to players from the past?

    -The speaker uses the example of Roy Cullenbine, who had a highly valuable final season by modern metrics but was underappreciated in his time. This demonstrates that the priorities and understanding of what makes a player valuable have changed over time, and applying modern analytics to past players can lead to incorrect assessments of their performance.

  • What is the speaker's stance on the use of advanced stats in baseball?

    -The speaker advocates for the use of advanced stats to gain a more accurate and nuanced understanding of players' performances. However, they caution against using these stats to mislead or to make unfair comparisons, emphasizing that they should be used for good, such as winning arguments on the internet.

Outlines
00:00
πŸ“Š Misleading Baseball Statistics

This paragraph discusses the potential for deception in baseball statistics. It starts by humorously mentioning how data can be manipulated to 'confess' to anything, akin to torture. The speaker admits to lying with baseball stats for years and provides examples of misleading charts, such as those with non-zero starting y-axes or double y-axes. The real trick, according to the speaker, is to use 'technical truths'β€”stats that are true but misleading. The speaker then uses examples from the 1968 Boston Red Sox and Jeff Mathis vs. Barry Bonds to illustrate this point, emphasizing that context and era matter when interpreting stats.

05:02
πŸ† The Art of Sample Size and Sabermetrics

The second paragraph delves into the misuse of sample sizes and sabermetric stats. It highlights how small sample sizes can lead to incorrect conclusions, as demonstrated by the comparison between Ian Happ and Jacob deGrom's ERA and batting averages. The speaker argues for larger sample sizes and multi-season data for more accurate assessments. The paragraph also touches on the importance of considering a player's total career and peak performance, using Eric Hosmer and Jordon Alvarez as examples. It concludes with a discussion on the limitations of traditional stats like batting average and the value of advanced metrics like OPS Plus.

10:02
πŸ₯Ž The Evolution of Baseball Metrics

This paragraph addresses the evolution of baseball metrics and how they can be subject to human discretion and era-specific interpretations. It starts by discussing how certain stats, like defensive runs saved, are only measured from specific years and may not reflect a player's entire career. The speaker uses Ken Griffey Jr. and Albert Pujols as examples to illustrate how advanced metrics can be misleading without proper context. The paragraph also introduces the concept of the defensive spectrum, which adjusts for the difficulty of different positions, and argues for a nuanced understanding of player value based on both traditional and advanced stats.

15:04
πŸ… Underappreciated Baseball Greats

The final paragraph tells the story of Roy Cullenbein, a player whose exceptional on-base skills were underappreciated in his time due to the era's focus on batting average. Despite having a high on-base percentage and setting an MLB record for consecutive games with a walk, Cullenbein's value was unrecognized, leading to his release and eventual retirement. The speaker uses Cullenbein's story to highlight the pitfalls of applying modern sabermetric understandings to past generations and emphasizes the importance of recognizing the context of each era in baseball history.

Mindmap
Keywords
πŸ’‘Torturing Data
The term 'torturing data' refers to the practice of manipulating or selectively presenting data to support a desired conclusion, often at the expense of accuracy. In the context of the video, it is used to illustrate how one can deceive by presenting baseball statistics in a misleading way, such as by not starting the y-axis at zero in a chart to exaggerate differences.
πŸ’‘Technical Truths
Technical truths are statistics or facts that are literally true but can be used to imply something false or misleading. In the video, the speaker advocates for the use of technical truths to make a point that is not accurate, such as using a statistic that is true but does not tell the whole story or is out of context.
πŸ’‘ERA Plus
ERA Plus is a baseball statistic that adjusts a pitcher's earned run average (ERA) to account for the run-scoring environment of the league and the ballpark in which they pitch. A value of 100 is average, with higher values indicating better performance relative to the league average.
πŸ’‘OPS Plus
OPS Plus, or On-base Plus Slugging Plus, is a baseball statistic that measures a player's offensive performance relative to the league average, with 100 being average. It takes into account a player's on-base percentage (OBP) and slugging percentage (SLG), providing a comprehensive view of their offensive contributions.
πŸ’‘Sample Size
Sample size refers to the number of observations or data points collected for statistical analysis. In baseball, it is crucial for determining the reliability of a player's statistics. A larger sample size generally provides a more accurate representation of a player's true abilities.
πŸ’‘Defensive Metrics
Defensive metrics are statistical measures used to evaluate a baseball player's defensive performance. These can include traditional statistics like fielding percentage, as well as more advanced metrics like Defensive Runs Saved (DRS) and Outs Above Average (OAA) from Statcast.
πŸ’‘Contextualizing Stats
Contextualizing stats involves understanding and interpreting statistical data within the relevant context, such as the era in which a player performed, the ballpark factors, and changes in the game's rules and conditions. This practice is essential for making fair comparisons and accurate assessments of players' performances.
πŸ’‘Sabermetrics
Sabermetrics is the empirical analysis of baseball, especially baseball statistics, that was defined by Bill James in the 1970s and 1980s. It uses statistical methods to analyze baseball and make decisions that will give a team the best chance to win.
πŸ’‘Narrative vs. Statistics
The contrast between narrative and statistics refers to the difference between the stories or perceptions we have about players and the actual data that measures their performance. Narratives can be influenced by anecdotal evidence, personal biases, and the media, while statistics provide a more objective, data-driven view.
πŸ’‘Player Evaluation
Player evaluation in baseball involves assessing a player's overall contribution to the team, taking into account both their offensive and defensive skills, as well as their consistency and impact over time. This evaluation can be based on traditional stats, advanced metrics, or a combination of both.
πŸ’‘Historical Context
Historical context is the background information about the conditions and circumstances in which events occur. In baseball, understanding the historical context is crucial for comparing players from different eras and appreciating the changes in the game over time.
Highlights

The concept of using 'technical truths' in baseball statistics to make misleading points is discussed, emphasizing the importance of context and interpretation.

The transcript highlights how charts can be manipulated, such as not starting the y-axis at zero, to give a false impression of team performance.

The use of a double y-axis in charts can create a misleading impression, as illustrated by the comparison of Jeff Mathis and Barry Bonds' hitting stats.

The importance of considering the era and run environment when evaluating player statistics is emphasized, using the Boston Red Sox's ERA in different years as an example.

The transcript points out the hyperbolic nature of some statistics, such as the 1968 Red Sox's ERA, and the changes in the game that affect these numbers, like the height of the mound and the use of a livelier ball.

The introduction of 'ERA plus' as a stat that compares a player's ERA to the league average provides a more accurate assessment of pitching performance.

The transcript discusses the impact of the steroid era on baseball statistics, and how it can skew perceptions of player performance.

The use of 'Ops plus' is introduced as a way to evaluate a player's offensive performance relative to the league average, providing a more nuanced view.

The transcript warns against using small sample sizes to draw conclusions about players, as these can be misleading and not representative of a player's true talent.

The importance of considering a player's total career and peak performance is highlighted, rather than just rate stats like on-base percentage and slugging.

The transcript discusses how defensive metrics can be misleading, and the value of advanced defensive statistics like 'total zone defensive run saved' and 'outs above average'.

The narrative addresses the limitations of certain stats when applied to players from different eras, using the career of Roy Cullenbein as an example.

The transcript emphasizes the need to understand the context of historical baseball statistics and the changing priorities of player evaluation over time.

The concept of 'defensive spectrum' is introduced, which attempts to adjust for the relative difficulty of different fielding positions.

The transcript cautions against using modern analytics to judge players from past generations, as the evaluation criteria and the talent pool have changed significantly.

The story of Roy Cullenbein's final season is used to illustrate the pitfalls of applying contemporary sabermetric understandings to players from the past.

The transcript concludes with a reminder to use statistical analysis responsibly and for the purpose of fostering good discussions about baseball.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: