How Not to Fall for Bad Statistics - with Jennifer Rogers
TLDRIn this talk, the speaker discusses how to interpret risk in everyday life, debunking common misconceptions and providing a toolbox for understanding headlines about health risks. They explore the difference between relative and absolute risks, the importance of considering confounding factors, and the misuse of statistics in advertising. The speaker uses humor and real-life examples to make complex statistical concepts accessible and engaging.
Takeaways
- π° The media often presents risk information in a way that can be misleading, focusing on relative risks rather than absolute risks, which can distort the true level of danger.
- π¦ When evaluating risk, it's crucial to consider the actual numbers and the context in which they are presented, rather than just the sensational headlines.
- π In understanding risk, public perception can be vastly different from the actual data. For example, more people die from hippos than crocodiles, contrary to what many might believe.
- π΄ββοΈ The perceived risk of activities can be skewed by how they are measured. For instance, cycling is considered more dangerous than driving a car when measured by deaths per billion hours, but this doesn't account for other factors like distance traveled.
- π₯ The impact of certain foods, like bacon, on health risks is often overstated in the media. It's important to look at the actual increase in risk (absolute risk) rather than just the percentage increase (relative risk).
- π The effectiveness of interventions like speed cameras can be misinterpreted due to regression to the mean, where improvements are attributed to the intervention rather than natural fluctuations in data.
- π When analyzing data, correlation does not necessarily imply causation. It's essential to consider other factors that might explain the observed relationship.
- π§ Spurious correlations can be found in any dataset, and they can be humorous or misleading. It's important to question whether there is a true causal relationship or if other factors are at play.
- π The concept of 'regression to the mean' is relevant in many areas, including sports, where temporary high or low performance can be misattributed to external factors like management changes.
- ποΈ In comparing cities or regions for safety, such as London and New York, short-term statistics can be misleading. Long-term trends provide a more accurate picture of relative safety.
- π Graphical representations of data can be manipulated to tell a specific story, sometimes at the expense of accuracy. It's important to scrutinize the scale and the actual numbers presented in graphs.
Q & A
What is the main topic of the talk?
-The main topic of the talk is understanding risk and how to make sense of risk-related headlines.
Why are headlines about risk important for our daily lives?
-Headlines about risk are important because they are supposed to inform our day-to-day decisions and help us live longer, healthier lives.
What is the first risk-related question the speaker asks the audience?
-The first risk-related question the speaker asks is which animal is more dangerous: crocodiles or hippos.
According to the World Health Organisation, which animal causes more deaths, crocodiles or hippos?
-According to the World Health Organisation, crocodiles cause more deaths than hippos, with 1,000 deaths a year compared to 500 from hippos.
What is the difference between relative risk and absolute risk?
-Relative risk tells you the risk in one group compared to another, while absolute risk gives you the actual probability of an event occurring.
What is the example used in the talk to explain the difference between relative and absolute risk?
-The example used is the risk of pancreatic cancer from eating bacon daily, which increases the relative risk by 20%, but in absolute terms, it means an increase from 5 to 6 cases in every 400 individuals.
What is the issue with comparing bacon and smoking as cancer risks based on statistical significance alone?
-Comparing bacon and smoking based on statistical significance alone is misleading because it ignores the actual magnitude of the risks, which are vastly different.
What is the concept of 'regression to the mean' and how is it demonstrated in the talk?
-Regression to the mean is the tendency for extreme results to be followed by more average results. It is demonstrated through a dice-rolling exercise where 'accidents' decrease after 'speed cameras' are placed, but this decrease is due to chance rather than the cameras.
Why did the speaker criticize the BBC's coverage of the story about living near a busy road and the risk of dementia?
-The speaker criticized the BBC's coverage because it focused on a 7% increased risk of dementia from living near a busy road, while ignoring other factors like smoking and obesity, which have a much greater impact on dementia risk.
What is the issue with small sample sizes in surveys and how can it affect the reliability of the results?
-Small sample sizes can lead to high uncertainty and unreliable results because they may not accurately represent the larger population. The confidence interval for the results may be wide, making it difficult to draw definitive conclusions.
How can graphics in advertisements or media sometimes mislead viewers about statistical data?
-Graphics can mislead by using incorrect scales, presenting data in a misleading way, or not clearly showing the uncertainty in the data. This can result in viewers having a distorted understanding of the actual statistics.
What is the speaker's role in the Royal Statistical Society and what are they working on?
-The speaker is a member of the Royal Statistical Society and is involved in a project aimed at improving data ethics in advertising.
Outlines
π° Understanding Risk in Daily Headlines
The speaker begins by discussing the prevalence of risk-related headlines in the media and their impact on public perception. They emphasize the importance of understanding how to interpret these risks, particularly the difference between relative and absolute risks. The speaker introduces the concept of a 'toolbox' for evaluating risk-related information and engages the audience with a survey on risky activities, highlighting common misconceptions about the dangers of animals, sports, and transportation modes.
π The Risky Business of Bacon
This paragraph delves into the controversy surrounding bacon and its alleged link to cancer. The speaker critiques the media's portrayal of bacon as a significant cancer risk, explaining the difference between relative and absolute risk. They use the example of pancreatic cancer to illustrate how a 20% increased risk in relative terms translates to a much smaller increase in absolute terms. The speaker also addresses the World Health Organization's classification of processed meats as carcinogenic, comparing it to the risk of smoking, and emphasizes the need for a more nuanced understanding of risk.
π Risk Perception and Measurement
The speaker continues the discussion on risk by examining how it is measured and perceived. They challenge the audience to think critically about the methods used to assess risk, using examples of cycling versus driving and the variability in risk assessment methods. The speaker also touches on the complexities of measuring risk in activities like flying, where the risk is not constant throughout the journey. This section underscores the importance of considering the methodology behind risk assessments.
π§ Dementia Risk and Living Near Busy Roads
In this paragraph, the speaker critiques a study that linked living near a busy road to an increased risk of dementia. They highlight the importance of considering confounding factors and the need for a comprehensive analysis of risk factors. The speaker points out that the study did not control for family history, which could significantly influence both the risk of dementia and the likelihood of living near a busy road. They also discuss the media's selective reporting of risk factors, urging the audience to question what information is being omitted.
π² Regression to the Mean in Accident Statistics
The speaker introduces the concept of regression to the mean using a dice-rolling demonstration. They explain how placing speed cameras in areas with high accident rates might lead to a reduction in accidents, which could be mistakenly attributed to the effectiveness of the cameras. The speaker argues that this reduction could simply be a random fluctuation, illustrating the need for a more thorough analysis over time to determine the true impact of interventions like speed cameras.
βοΈ Air Travel Safety and Statistical Fluctuations
This paragraph discusses the perception of air travel safety, particularly in light of a year with no passenger jet crashes. The speaker argues that such a statistic could be misleading and that fluctuations in crash rates are expected. They caution against overreacting to single-year data and emphasize the need for a broader perspective when evaluating safety trends. The speaker also addresses the comparison of murder rates in London and New York, highlighting the importance of considering longer-term data.
π’ The Misuse of Statistics in Advertising
The speaker critiques the use of statistics in advertising, particularly in the context of a toothpaste advertisement. They explain the concept of uncertainty in statistical results and the importance of understanding the difference between probability theory and statistical inference. The speaker emphasizes the need for a clear understanding of what is being measured and the potential for variability in results, urging consumers to be skeptical of statistical claims in ads.
π Sports Statistics and the Illusion of Causation
In this paragraph, the speaker discusses the misuse of statistics in sports, particularly in the context of managerial changes and their perceived impact on team performance. They explain how regression to the mean can explain seemingly sudden improvements in performance, which may be attributed to new managers but are actually part of the team's natural fluctuation in performance. The speaker also addresses the 'Sports Illustrated curse,' illustrating how exceptional performance can be followed by a return to average performance.
π Misleading Graphics in Statistical Presentation
The speaker concludes by highlighting the importance of accurately presenting statistical information, particularly in graphics. They critique several examples of misleading or incorrect graphics, emphasizing the need for clarity and accuracy in data presentation. The speaker encourages the audience to be vigilant in interpreting statistical graphics, considering the scale, the source of the data, and the potential for misrepresentation.
Mindmap
Keywords
π‘Risk
π‘Relative Risk
π‘Absolute Risk
π‘Correlation vs. Causation
π‘Confounding Factors
π‘Regression to the Mean
π‘Statistical Significance
π‘Uncertainty
π‘Confidence Interval
π‘Data Ethics
π‘Hypothesis Testing
Highlights
The speaker discusses the importance of understanding risk in everyday life and how to interpret risk-related headlines.
A tool box of questions to consider when evaluating risk headlines is introduced.
The audience is engaged in a survey to assess their understanding of risk, involving dangerous animals, sports, and transport modes.
Contrary to popular belief, hippos are more deadly than crocodiles, and cheerleading causes more accidents than baseball.
The distinction between relative and absolute risks is explained using the example of bacon consumption and pancreatic cancer risk.
A comparison of the risk of lung cancer from smoking versus pancreatic cancer from bacon shows a significant difference.
The concept of statistical significance is explained and its limitations in quantifying risk are highlighted.
The importance of considering daily habits and lifestyle factors when evaluating health risks is discussed.
The difference between correlation and causation is emphasized with humorous examples, such as fizzy drinks and teenage violence.
The use of confounding factors in statistical analysis is introduced to explain spurious correlations.
The concept of regression to the mean is demonstrated with a dice-rolling experiment to illustrate random fluctuations.
The impact of regression to the mean on the interpretation of speed camera effectiveness is discussed.
The speaker shares personal experiences of challenging misleading statistics in advertising with Ryanair.
The misuse of statistics in a toothpaste advertisement is critiqued, highlighting the issue of small sample sizes.
The importance of considering effect size alongside sample size in statistical analysis is explained.
The speaker concludes with advice on interpreting statistics in the media, emphasizing the need for critical thinking.
Transcripts
[MUSIC PLAYING]
They just weren't very good.
Anyway, here we go.
So "Living is a Risky Business," I'm
going to be talking today about risk.
We are bombarded every single day
with headlines full of all of these things
that we should and shouldn't be doing in order
to live a longer life.
You know, dementia, sitting too long may increase a middle
person's-- middle-aged person's risk.
And sometimes these even have numbers on them.
So an everyday painkiller doubles the risk
of heart attacks and strokes.
A child's risk of brain cancer triples after just two CT
scans.
And we're supposed to use these headlines to inform
our day-to-day lives and to help us make decisions as to how
we should be living our lives.
But how do we actually make sense of these numbers?
So throughout this talk, I'm going
to give you a little bit of a tool box
as to all of the things that you should be thinking about,
all of the questions that you should be asking yourselves,
when you see these sorts of headlines in the newspapers.
But first of all, I thought I would try and get
an idea as to how good you guys are at understanding risk.
Do you know what risky activities are?
So I'm going to do a bit of a survey.
I can't help it.
I'm a statistician.
We can't help but do surveys.
So I've got some different risk scenarios.
And I want you to tell me which is the most risky.
So the first one is I'm going to ask you which
is the most dangerous animal?
So which animal causes the most deaths?
So I'll show you both options.
And then I'm going to ask you to vote for me.
So is it crocodiles or is it hippos?
So if you think it's a crocodile, give me a cheer.
[CHEERS]
It's about half a dozen, dozen people think crocodile.
If you think it's a hippo, give me a cheer.
[LOUD CHEERS]
OK, that's overwhelmingly in favour of the hippo.
I can tell you it's not looking good because--
[LAUGHTER]
--crocodiles, according to the World Health Organisation,
cause 1,000 deaths a year compared with 500 from hippos.
OK, so I got two more.
Let's see if you can redeem yourself.
Which is the most dangerous sport?
So which causes the most accidents?
Is it baseball or is it cheerleading?
So give me a cheer if you think it's baseball?
[CHEER]
Give me a cheer if you think it's cheerleading.
[CHEER]
Now, it's a little bit more 50/50,
probably slightly in favour of the cheerleader.
So you have slightly redeemed yourself.
Cheerleading does cause more accidents than baseball.
OK, last one, which is the most dangerous mode of transport?
Is it riding a bike or driving a car?
So give me-- give me a cheer if it's riding a bike?
[CHEER]
Give me a cheer for driving a car.
[CHEER]
That's pretty 50-50 actually.
OK, let me do that one more time.
Riding a bike?
[CHEER]
Driving a car?
[CHEER]
I'd say that's about 50/50.
I can tell you that actually riding a bike
is more dangerous than driving a car, 550 deaths per billion
hours versus 130 deaths.
This is a really interesting one though,
because it raises all sorts of questions as to how
do we even measure risk?
I've chosen to measure this based on the amount of time
a person would spend doing each activity.
Some people may choose to think about the number of accidents
per miles travelled.
But cycling still comes out as more dangerous.
Some people might think to do this, the number of journeys
that you take.
If we think about other modes of transport,
such as flying in airplanes, the risk
isn't constant the whole time that you're up in an airplane.
It's more risky as you take off and as you land.
So even just how do we measure risk
is a really interesting question in itself.
So now we've established that you're, OK, at risk,
I think that's a fair assessment.
As I said, I want to talk to you about risks
that you see every day and give you a toolbox as to everything
you should be asking.
And I want to start out by talking about the humble bacon
sandwich.
[LAUGHTER]
Now, according to the headlines bacon
is one of the worst things you can be eating.
It causes all sorts of different types of cancer.
This headline here, say in "Daily fry-up boosts
cancer risk by 20%."
So if you eat bacon on a daily basis,
you increase your risk of pancreatic cancer by 20%.
And that's a shocking statistic, that actually
caused bacon sales to plummet.
Is it really though something that
we need to be worried about?
Crucially, when we see headlines like this,
they're actually giving us what we call relative risks.
They're only telling us what the risk is in one group
relative to another.
So I know that if I eat bacon, I've
got a 20% increased risk compared
to those who don't eat bacon.
But I don't know anything about what the risk actually is.
And that's where absolute risks come in.
So absolute risks depend on the numbers
that are associated with each of these things.
So how do I take a relative risk and turn it
into an absolute risk?
First of all, I need to know what my chances are
of getting pancreatic cancer.
And according to Cancer Research UK, we have a 1
in 80 lifetime risk of pancreatic cancer.
So what does that mean?
That means if we were to take 400 individuals who
didn't eat bacon, we would expect five of them
to get pancreatic cancer anyway.
My flicker has decided to stop working.
So if we then look back at our headline,
our headline says that our daily fry-up boosts our cancer risk
by 20%.
20% is a fifth.
And what's a fifth of 5.
It's just one, meaning that our risk goes
from five in every 400 individuals
to six in every 400 individuals.
It's only an extra one person in every 400.
So whilst that 20% increase sounded really scary,
a headline that said it increases your risk
an extra one person in every 400 wouldn't sound anywhere near
as scary or anything.
You didn't need to be worried about it.
There was also a headline that said that bacon, ham,
and sausages were now as big a cancer threat as smoking,
the World Health Organisation were to warn.
Now, the reason for this is that the World Health Organisation
produces these lists of known risk factors
for different types of cancer.
And smoking was already on there as a known risk
factor for lung cancer.
And they were saying that because this processed meat was
now being added to that list for the first time,
it meant that they were as risky as each other.
Now, these lists are based on something called
statistical significance.
And statistical significance just
tells us whether or not something definitely
does or definitely doesn't cause cancer.
It doesn't quantify that risk in any way.
So how do the risks for smoking and lung cancer
compare to those risks that we've just
seen for bacon and pancreatic cancer?
So if we take our 400 individuals again,
if you got 400 people who don't smoke,
you would expect four of them to get lung cancer anyway.
If you smoke 25 or more cigarettes every single day,
that goes up 24 times, to 96 in every 400 individuals.
So that's an extra 92 in every 400
compared to that extra one in every 400
for the bacon and pancreatic cancer.
So, yes, they may both be statistically
significant in causing cancer.
But to say that they now were as big a cancer
threat as each other, they were as risky as each other,
is absolutely ludicrous because we
can see that there is a huge difference in the risks.
What we also need to think about when we see these headlines
is that it compared those who eat bacon every single day
with those who never eat it.
And the risk was only increased by an extra one
person in every 400.
If you only eat bacon, say, once a week
on a Saturday morning as a treat to yourself,
it's going to have an even smaller effect.
And so it's going to be absolutely tiny, this risk,
Plus, what you also need to think
about is if you're eating bacon for breakfast every day,
you're not eating fruit for breakfast every day.
You may be more likely to have an unhealthy lifestyle
in general.
And how do we know that it's the bacon that's actually
causing this increased risk of cancer,
and it isn't another one of these unhealthy lifestyle
factors instead?
And what we say in statistics is that correlation doesn't always
mean causation.
Here are some of my favourite headlines for demonstrating
correlation versus causation.
So, yeah, fizzy drinks make teenagers violent.
Fizzy drink a day make teenagers behave aggressively.
Children drinking fizzy drinks are regularly
more likely to carry a gun.
[LAUGHTER]
Now, it could be that drinking fizzy
drinks makes teenagers violent.
Or it could be that there's some other social demographic factor
that means a teenager is more likely to have
fizzy drinks in their diet.
And they're also more likely to be violent.
Or it could be that being violent is thirsty work.
And at the end of it, you want a fizzy drink.
[LAUGHTER]
We don't know which way around that relationship goes.
One of the best ways to think about correlation
versus causation is if you think about ice cream sales.
As ice cream sales go up, so do the number of drownings.
[LAUGHTER]
So does that mean that ice cream causes drownings?
Both of these things are affected
by something else, hot weather.
As the temperatures increase, we eat more ice cream,
as temperatures increase, we go into the sea more often,
meaning that there are naturally just more drownings.
Once we actually take that into account,
that direct relationship between ice creams and drownings
disappears.
We call this a confounding factor.
And once we account for the confounding factor
in our analysis, then that direct relationship
between these two things disappears.
There's a really nice website that I like to go on,
that allows you to correlate all these weird and wonderful
things with each other.
So I got on there.
And I've picked out some of my favourites.
If you google spurious correlations,
it's the first one that comes up.
You can have a lot of fun with it.
I spent too much time trying to find interesting correlations
to my presentations.
But there is a 98% correlation between the amount
of money spent on admission to spectator
sports and the number of people who died
by falling down the stairs.
So does this mean as we're all spending money
to go to these big arenas, we're all falling down
the stairs at the same time?
I don't know.
There is a 94% correlation between maths doctorates
awarded and the amount of money spent on pets.
Now, I'm a dog lover.
Am I a dog lover because I'm a mathematician?
I don't know.
My absolute favourite though is that there is a 95% correlation
between the per capita consumption of cheese
and the number of people who died by becoming tangled
in their bed sheets.
So does this mean that we shouldn't eat cheese before we
go to bed because we might die?
These are two things that are obviously
just-- they just happen to be correlated with each other.
And it doesn't mean that one is causing the other.
Now, you might be saying to yourself,
this is all well and good.
This is all very funny.
I know that cheese doesn't cause death by bedsheet.
When do I actually really need to think
about this in real life?
I was asked to comment on a story that was run by the BBC--
and it was January 2017, so just over two years ago now--
that said, live nearing a busy road increased
your risk of dementia.
It apparently increased your risk of dementia by 7%.
And the BBC got in touch and wanted me to comment on it.
And they wanted me to talk about this relative
versus absolute risk.
So I went and I had to look at the paper--
it was published in The Lancet--
just to try and get an idea as to what the absolute numbers
might have been.
And whilst I was looking at this study,
I realised that they hadn't controlled for family history
in their analysis.
And I argued that we know there's a huge family history
element to dementia.
But you could also argue that there's
a family history element as to where you might live.
If you grew up in the middle of the countryside,
you might be more likely to continue
living in the countryside as an adult.
If you grew up in a big city, you
might be more likely to live in a big city as an adult.
And so you've got this big family history
element to dementia and family history element
as to where you might live.
And I said the fact that that wasn't
accounted for in the analysis was a major let down
in the study.
Also while I was looking at this,
I looked at the supplementary material.
And it looked to all of these other things
that they had looked at, that might
be associated with dementia.
So this top row here are all of these different factors
that they thought might be associated with dementia.
So, for example, where you can see that smoking,
that 1.3 means that smoking increases your risk of dementia
by 30%.
The obesity, so obese versus a normal weight,
increases your risk of dementia by 64%.
And yet the newspapers had chosen
to really focus on this living near a busy road increasing
your risk of dementia by 7%.
And I said, you know, before you went
to pick up sticks and move to the countryside,
there are lots of other things that you
could do that would have a bigger
effect on your risk of dementia, quitting smoking,
losing weight, and higher education versus lower
education.
But the newspapers just chose not to report any of this.
And so what I would always say to you when
you see these headlines, have a little think to yourself,
what aren't they telling us?
What else could be going on?
So I'm asked quite a lot actually
to comment on stories that appear in the press.
And this was another one that I was
asked to comment on at the beginning of last year, that
said that 2017 was the safest year for air travel
as fatalities fall.
So in 2017, there were no deaths anywhere in the world
caused by passenger jet crashes.
And early into 2018, there then was a passenger jet crash.
And there was all this sort of investigation
as to has everything gone wrong?
Was 2017 the safest year we're ever going to have?
Do we now have to start investigating to try and figure
out what's happened?
And so I was asked to comment on this story.
And I want to do a little demonstration with you.
So there are dice in these two rows.
Some of you will have been given dice
or had dice underneath your seat as you sat down.
Could you just wave you dice subtly, please?
So we're going to do a little demonstration with these dice.
And we're going to do a little demonstration that
thinks about these things, speed cameras.
So everybody's favourites, I know.
So when speed cameras first came in,
the government needed to give some serious thought as
to where they might put them.
We couldn't put speed cameras absolutely everywhere.
Was there some sort of sensible strategy
that we could adopt, to decide where we might put those speed
cameras?
And we're going to recreate that exercise now.
And what the government decided to do
was to try and identify all the accident hotspots.
And they said that those were obviously
the most dangerous places.
And those were the ones that were in the highest need
of getting these speed cameras.
And we're going to recreate that now.
So all of you who have got the dice, in a second I'm
going to ask you to roll them.
I'm very aware of the fact that there's not that much room.
So can I just suggest you give them a good shake in your hand
and just drop them on the floor.
But, yes, so I'm going to ask you to do that.
I want you to do it twice.
And count up the score that you get.
And then we're going to decide where we're
going to put our speed cameras.
So if you could all roll them for me twice,
that would be marvellous.
There should be quite a few more on that row actually.
There should be more than one.
Oh, you've handed them back.
OK, that's fine.
OK, right, so did anybody get a 12?
Anybody get an 11?
Oh, I heard a twitch then.
Anybody get a 10?
I only got 1.
OK, right, what we do is we're going to redo this.
I usually have more dice.
I gave this talk last week to a group of teenagers.
And they stole my dice because they all
like to take souvenirs.
So I'm now doing a probability problem with fewer dice
than I normally would have.
So bear with me.
And we'll just repeat that again.
So if you could all just give them a really good roll
and repeat that.
Do it twice for me.
And we'll see what we get.
This is what happens when you present to teenagers.
They like souvenirs.
OK, right, how about this time.
Did anybody got any 12s?
Any 11s.
We've got two, OK, right, brilliant.
3?
3?
Sorry, it's the lights.
OK, so I'm going to give you a speed camera.
So we got a speed camera there.
You happen to be all spread out as much as you possibly
could be.
Oh, thank you very much.
And a speed camera over here.
Can I just ask you to pass that back behind?
That'll be brilliant.
Thank you.
So now we've got our speed cameras in place.
We now need to see if they've worked.
So what I want you to do is to all repeat the same thing
again.
Last time, I promise.
If you could all just roll your dice for me again, twice.
OK, where we have speed cameras, what did you get this Time
6.
6.
8.
8.
7.
7.
So we can see where we've got our speed
cameras we've seen a reduction in the number of accidents--
[LAUGHTER]
--meaning our speed cameras have worked right?
Maybe not.
OK, so this is a really nice demonstration
of what we call regression to the mean.
So what do I mean by regression to the mean?
Regression to the mean is we all behave according
to some sort of average.
But we don't get exactly the same value every single time.
We have these random fluctuations around it.
I simulated here a number of accidents,
where I kept the average a constant 10 all of the time.
But you see that I don't always get 10 accidents every month.
Sometimes I see higher than that.
Sometimes I see lower than that.
And this is what you would expect to see just by chance.
And crucially, it was one of these random highs
where I actually chose to put in my intervention.
You didn't have a higher chance of rolling
a high number that then changed when I gave you a yellow hat.
I chose to put the interventions in those places that
were randomly high the first time around.
And regression to the mean tells me
that I would have expected them to be lower the next time
around just by chance.
It's regressing to the mean.
So regression to the mean tells you
I've got something lower the first time around.
I expect it to be higher the next time around.
And this is exactly what happened
when speed cameras came in.
The government put them in those places
with the highest number of accidents.
They saw there was a reduction.
And then they said that the speed cameras
must have been working.
And it took a group of statisticians
to come in and say, actually, you
need to be looking at this over a long period of time
to be able to say whether or not the average is changing
through time, whether or not the average number of accidents
is actually decreasing.
So we see regression to the mean of the time in sports.
If you think about your favourite sports teams,
they'll go on random winning streaks.
And they'll go on random losing streaks.
When they go on losing streaks, they sometimes
sack their manager.
And a new manager comes in.
And they say, oh, look, they're winning again,
must be the manager.
A lot of the time, it can be explained by regression
to the mean.
That losing streak is just a random low.
And when they bring the new manager in,
they just go back to their average performance level.
And some research has shown that actually those teams that
stick with their managers see the bounce back
in form much quicker than those that
actually bring in new managers.
There's something called the Sports Illustrated curse, that
says when you appear on the cover of Sports Illustrated,
it's a curse.
You then go on to perform really badly.
But it can be explained by regression to the mean.
If you think about what does it take
to appear on the cover of Sports Illustrated,
you have to be at the very top of your game, which
is going to be a combination of your natural ability.
But you're probably also going to be riding one
of these random highs as well.
And this curse isn't necessarily a curse.
It's just you then--
that random high coming to an end.
And you're going back to your average ability.
So I argued that when we looked at this story here, all of this
could be explained by regression to the mean.
We would expect the number of air crashes and fatalities
to remain low.
But we are going to see these fluctuations around it.
Some years, we're just naturally going to see slightly more.
And some years we're just going to see slightly less.
And the fact that there have been none in 2017,
and then one in 2018, didn't necessarily
mean that everything had gone wrong.
And we all of a sudden needed to be having
these big investigations.
There were also stories about this time last year looking
at the London murder rate now beating New York,
as stabbings surged.
And there was a question as to whether or not
London was now a more dangerous city than New York.
And BBC's Reality Check actually looked into this.
So this is a really good resource,
that the BBC's Reality Check.
So they get statisticians to look at these sorts of claims.
And the claim was that London it overtake
overtaken New York for murders.
And it was now more dangerous.
And they found that a selective use
of statistics over a short period of time
appeared to bear it out.
But the reality was that New York still appeared
to be more violent than London.
If you looked at it over a longer period of time,
then New York did appear to be still more dangerous
than London.
So while we're on the topic of airplanes, I--
a vice president of the Royal Statistical Society.
And some of the work that I do with the RSS,
I have a bit of a hobby.
And I like to give Ryanair a headache.
[LAUGHTER]
So it first started out when I was
approached by BBC's Watchdog.
So Ryanair had changed their seating allocation algorithm.
It used to be that if you'd booked
as a group, when you checked in you would all
get to sit together.
And then they changed it and they
said if you didn't book seats together and pay for them,
you would be randomly scattered throughout the plane.
And loads of people started complaining to Watchdog,
saying that they thought there were too
many middle seats being given out.
So the window seat is quite desirable
because you get the nice view.
The aisle seat, you get a little bit of extra legroom.
The middle seat is seen as the least desirable seat.
But everyone seemed to be getting them.
And I thought that might be something going on there.
So they decided to send four of their researchers
on four flights.
And on every single one of the flights,
they were all allocated middle seats.
And they got in touch with me and said, hey,
what's the chances of that happening, if the seating
allocation is truly random?
So I did some stats for them.
And then I went on TV and I told them what I found.
So it wasn't actually a very complicated calculation
that we did.
They sent me the information available at the time
of check-in for each of the four flights.
So this is an example of one of them.
So when they checked into their flights
there were 23 window seats, 50 middle seats,
and 27 aisle seats available, so a total of 65 seats.
And using this, I can then work out the probability
that they're all given middle seats.
So the probability that the first person gets a middle seat
is 15 over 65 because there's 15 middle seats
and 65 seats in total.
The probability then that the next person
is given a middle seat is 14 over 64
because there's now 14 middle seats
available from 64 seats in total.
And I carry on.
So the probability of the third person is 13 over 63
and the fourth person is 12 over 62.
And if I multiply all of these together,
that gives me the probability of all four being middle seats.
And it's about 0.2%.
Which is one in 500, which isn't actually
that small a probability if you think about how many flights
Ryanair have every day.
One in 500 of this, it's not too surprising.
But this is just one flight.
As I said, they did it on another three flights.
And they all got middle seats on those three as well.
So I did the same calculations for the other three flights.
And then I combined it altogether.
And I found out the probability of all four researchers
get middle seats on all four flights was around 1
in 540 million.
So you were more than 10 times more likely
to win the national lottery than you were for this scenario
to happen.
But, you know, tiny probabilities
don't necessarily mean rare events.
So I went and had a look at Ryanair's facts and figures.
And they say that they only carry
130 million annual customers.
So I was pretty convinced that not
only was this a small probability,
it was a rare event.
And I was suspect as to whether or not
there was something going on with their algorithm.
Now, they said, you know, we've got our stats.
You've got yours.
My stats are right, thank you very much.
But anyway--
And it all kind of-- it kind of died a little bit of a death.
We got some media attention.
It was in the newspapers, but then not really very much
happened.
Until a couple of months later, when 12 women
all went on holiday together.
And they all got middle seats.
And they called the Telegraph.
And the Telegraph then called me and said, hey,
we heard you did some work on this.
What are the chances?
So I went through everything that I'd
done on the Watchdog story.
And they got in touch with Ryanair.
And at that point, Ryanair admitted
that they'd been lying.
They admitted that they actually kept window and aisle seats.
They held them back when randomly allocating the seats
because those were the ones that people
were most likely to pay for.
So this random allocation wasn't a random allocation
throughout the whole plane.
It was a random row within middle seats
that you were actually getting.
And so, yeah, I was really happy that I
managed to get Ryanair to admit to their customers
that they'd lied.
And I also managed to upset them in the process
because they didn't think that the negative media
attention, including the BBC investigation, was warranted.
So let's all feel sorry for Ryanair.
However, they are the gift that keeps on giving.
So last April, they released the results of a customer
satisfaction survey.
They said the 92% of their customers
were satisfied with their flight experience.
I thought, really?
I'd been on a Ryanair flight, 92%?
So I decided to take a look at the survey.
Now, bear in mind this was an opt-in survey.
So my argument was you're only going to opt into a survey
if you're really satisfied with your experience
and you want them to know about it
or you're dissatisfied with your experience
and you want them to know about it.
This was a survey that they asked people to fill in.
So this was the 92% here.
But if we look at the options that people
got when filling out this survey,
they went from excellent to OK.
So if you were dissatisfied with your Ryanair experience,
there was no way of expressing that dissatisfaction at all.
And I argued that you just then wouldn't carry out-- you just
wouldn't fill out the survey.
You'd just exit, switch it off, and you'd disappear.
So basically what you're asking was a group
of satisfied Ryanair customers just how satisfied they were
with their Ryanair experience?
And then we were really surprised
that once you combined three of the columns,
you got a high percentage.
So I went into the Times and I said as much.
And they had a really grown up response,
where they said 95% of Ryanair customers
haven't heard of the Royal Statistical Society.
[LAUGHTER]
97% don't care what they say.
And 100% said it sounds like their people need to book
a low-fare Ryanair holiday.
I mean, the stats in that are wrong
because if 100% say we need to book a low-fare holiday then
100% of them have heard of us.
So the stats are wrong to start off with.
But one of the members of the Royal Statistical Society
noted that there were 130 million annual Ryanair
customers.
And if 5% of them had heard of the Royal Statistical Society,
that meant that it was 6 and 1/2 million Ryanair customers who
had heard of the ROYAL Statistical Society.
And to be honest, we'd probably take that.
But there we go.
So, yeah, I like to--
I like to give Ryanair a headache as a hobby.
It's quite fun.
Interestingly, though, my boyfriend
is currently at the end of his training to be a pilot.
And Ryanair is one of the big options
that he might want to work for.
So that's irony right there.
So as I said, I'm a member of Royal Statistical Society.
And one of the big projects that we've got for this coming year
is actually trying to improve data ethics in advertising.
So why is this such an issue?
I'm going to play you a little advert.
And we're going to talk about adverts
in a little bit more detail.
[VIDEO PLAYBACK]
[MUSIC PLAYING]
- Pearl Drops cleans.
Pearl Drops whitens.
Pearl Drops protects.
Pearl Drops shines.
Pearl Drops 4D Whitening System, not only whitens, but cleans,
shrines, and protects, too.
Ultimate whitening, up to four shades
whiter in just three weeks.
Pearl Drops Tooth Polish, go beyond whitening.
[END PLAYBACK]
So we're used to seeing these all the time.
And we're to seeing these things all of the time,
at the bottom of them.
So there's some survey that's been done.
And so many people agree.
Now, there's lots of things wrong with this.
First of all, agree with what exactly?
I mean, there are a lot of claims in the advert.
It cleans, it whitens, it brightens.
Which of these exactly are they agreeing with?
But a lot of time when people hear I'm a statistician,
it's like, oh, adverts, can't trust anything, can you?
You can't trust any of the stats in there.
They use such small sample sizes that none of the results
are reliable.
And I want to talk about this in a little bit more detail.
So when you see this 52% of 52 people agreed,
what should you be thinking about when you see this?
So I want to talk a little bit about uncertainty.
What do I mean by uncertainty?
So if I was to take 10 people and line them up here and ask
them to flip a coin 10 times, I know
that there is a 50/50 chance of getting a head or a tail.
But if they all flipped it 10 times,
they wouldn't all get five heads and five tails.
Some people might get six heads.
Some people four heads, some people might get 10 heads.
That's what I mean by uncertainty.
In statistics, we talk about the difference
between probability theory and statistical inference.
So in probability theory, we know
the underlying probability.
And yet we see noisy data when we do experiments.
So that coin example, I know the underlying probability
is 50/50.
But I see noisy data when I do different experiments with it.
A lot of the time in statistics what we're actually trying
to do is to go the other way.
And we're trying to use samples of data
that we know are noisy and subject to uncertainty
and use that to tell me something about what
the underlying probability is.
So there I had 52% of 52 people agreed.
If I'd taken a different 52 people,
I wouldn't have seen exactly 52% agree.
And I'd have seen a slightly different number.
And a different 52 people would have given me
a slightly different number again.
And ultimately, what I'm trying to do
in statistics is to take that sample
and that piece of data that I know
is noisy and subject to uncertainty,
and use that to tell me something
about what the underlying probability is on a population
level.
So what I'm trying to do is I'm really
trying to create a hypothesis test to see whether or not
that percentage that I'm seeing is statistically significant?
So what do I mean in hypothesis testing,
how do I carry that out?
What I do is I formulate what we call a null hypothesis.
And a null hypothesis would be that the observations
are a result of pure chance.
So my underlying probability of people agreeing
is actually just 50/50.
It's all just down to chance.
And what I then say is let's assume that that's true.
Let's assume a null hypothesis is true.
Let's assume that the data I'm seeing is just random.
And it's just by chance.
What then is the probability of me seeing the data that I've
seen or seeing something at least as extreme
as what I've seen?
So let's break that down.
I understand that's quite a lot to get your head around.
So let's break that down for this particular example.
So my null hypothesis in this example would actually be 50%.
I'm assuming that these people have got a survey that say, you
agree that this toothpaste whiten your teeth, yes or no?
What pure random would be if they just randomly
ticked yes or no.
And so across my sample, I would expect
it to be about half yeses and half nos.
That would be what would be my pure random just by chance.
So 50% would be what would correspond to my pure chance.
So then if I had, going in the direction of agreeing or going
in the direction of disagreeing, that's
actually giving me information.
That's telling me some of the people
have got an opinion as to whether or not they disagree
or agree with that statement.
And I've got my 52% here.
And I know that that is subject to uncertainty.
So as I said, I know that if I took a different 52 people,
I'd get something slightly different from this.
So what I can do is I can put a confidence interval on this.
And this confidence interval is related to the sample size.
And it tells me, OK, 52%, that's the best
estimate as to what the true underlying probability might
be.
But what could it be?
What values could it possibly take?
And a confidence interval then gives me
a range of values that might be plausible.
And as I increase my sample size,
I actually decrease the amount of uncertainty.
And I decrease-- I make my confidence interval smaller.
And what we're looking for in hypothesis testing
for a statistically significant result
is we want that confidence interval to not cross
the null hypothesis.
So my null hypothesis here was 50%.
I want my confidence interval to not cross that 50%.
If it doesn't cross that 50%.
I say it's a statistically significant result.
If it does cross that, then I say
I haven't got enough evidence to say, actually
that it's just pure chance.
It could just be pure chance.
So that confidence interval really
matters when I've got something close to the null hypothesis
because that 52% is really close to that 50%.
I'm going to need quite a big sample
size to be able to make that confidence interval small
enough so that it doesn't cross that 50%.
If, on the other hand though, I had
to get a result that was a lot further away from that 50%,
I wouldn't necessarily need to have as big a sample size
because it's not as close to that null hypothesis of 50%.
It doesn't matter if the confidence interval is wider.
So yes, when you see these surveys
that are being done on small samples,
it's not always a problem.
It depends on how big your result actually is.
It's not just the sample size in itself,
but it's also what we say, you know, the effect sizes as well.
So just back to our example, we have 52% of 52 people agreed.
A 95% confidence interval on this based on these 52 people
is 38 to 66.
So 95% means if I was to repeat this a hundred times,
I would expect 95% of them to between 38 and 66.
And it crosses that 50% mark.
So here, this is no different from just pure randomness.
This is no different from people just flipping a coin,
saying yes or no, I agree with that statement.
If I was to take another one that said 74% of 54 men
agreed with some statement after 28 days,
the confidence interval on this is 60 to 85.
So it's a similar kind of sample size.
We've got similar sized confidence interval.
But because our treatment effect was 74% to start off with,
and that's a lot further away than the 50,
I have enough evidence here to say that there is a difference.
And actually people do have a preference.
And I'm just going to finish off now
with a couple of final graphics, that just say-- because I don't
think uncertainty necessarily has to be
very difficult to communicate.
I think--
[LAUGHTER]
When we look at the weather and we
look at when they tell you your probability of rain,
I mean, these numbers are ridiculous.
So what is it?
I mean, at 3 o'clock we have a 10% chance of rain.
That goes up to 13% at 4 o'clock and 16% at 5 o'clock.
What am I supposed to do with this information?
I don't know what the uncertainty is on that.
And they're really precise point estimates.
But it would be super-easy to communicate the uncertainty
using some sort of graphic.
Now, graphics have the ability to do great good.
They also do have the ability to do great evil.
And I just want to finish off with a couple of my favourite
bad graphics because it is something you really need
to watch out for when you're looking at stats in the media.
So there's this one, which is one of my favourites.
This is the presidential run.
So a pie chart should sum up to 100%.
[LAUGHTER]
This doesn't.
And they've obviously here asked would you back this person,
yes or no?
And then thought that a pie chart
was the most appropriate way to communicate that information.
This one, I've got no idea what they asked.
Half of Americans have tried marijuana today?
I'm not-- I don't know if I believe that.
But if 43% of them have tried it in the last year, which
includes today, how have 51% tried it?
The numbers are all wrong.
I can't figure out what's going on.
They have however, though, included uncertainty.
We know it's plus or minus 4%.
But I've got no--
I've got no idea.
This one from the Office of National Statistics
is a very sneaky one.
And one that shows you always need
to look at the scale of the plot because actually this
is a increase in GDP.
And it went from 0.6% to 0.7%, they're
predicted growth upgrade.
And it's a 0.1% increase.
And that looks a lot bigger on that plot.
And if we look at the axis along the bottom--
and look at the scale of that.
If you zoomed out onto that plot and looked
at the whole percentage line, it would
be just a minuscule difference.
So look at the scale.
And my last one is my favourite, from Ben and Jerry's.
I don't know what world we live in where 62 is smaller than 61.
But I don't want to live in that world.
Here is an example where Ben and Jerry's have got
a story that they want to tell.
And the numbers didn't quite agree with that story.
So they decided to produce a graphic that
told that story anyway.
And hoped we wouldn't look at the numbers in enough detail.
Look at the size of that 24% compared to that 21%.
They've got a clear story that they wanted to tell.
And if you were just flicking through a magazine,
you might not necessarily look at the numbers
in as much detail.
So yes, so very, very naughty from Ben and Jerry's.
So if you look at stats in the media,
I would encourage you think, relative
versus absolute risks, correlation versus causation.
Could this have happened just by chance, this regression
to the mean?
Yeah, eat bacon.
But don't eat cheese before you go to bed.
Thank you very much.
[APPLAUSE]
Browse More Related Video
Marco Bonzanini - Lies, damned lies, and statistics
Relative vs Absolute risks: Why Relative Risks Are Misleading, and How To Communicate Absolute Risks
Contingency Table β Relative Risks β Epidemiology & Biostatistics | Lecturio
Relative Risk vs Odds Ratio! EXTENSIVE VIDEO!
A crash course in organic chemistry | Jakob Magolan
4.1.2 Basics of Probability - Four Approaches to Find or Estimate Probabilities
5.0 / 5 (0 votes)
Thanks for rating: