3 ways to spot a bad statistic Mona Chalabi

I’m going to be talking
about statistics today.

If that makes you immediately feel
a little bit wary, that’s OK,

that doesn’t make you some
kind of crazy conspiracy theorist,

it makes you skeptical.

And when it comes to numbers,
especially now, you should be skeptical.

But you should also be able to tell
which numbers are reliable

and which ones aren’t.

So today I want to try to give you
some tools to be able to do that.

But before I do,

I just want to clarify which numbers
I’m talking about here.

I’m not talking about claims like,

“9 out of 10 women recommend
this anti-aging cream.”

I think a lot of us always
roll our eyes at numbers like that.

What’s different now is people
are questioning statistics like,

“The US unemployment
rate is five percent.”

What makes this claim different is
it doesn’t come from a private company,

it comes from the government.

About 4 out of 10 Americans
distrust the economic data

that gets reported by government.

Among supporters of President Trump
it’s even higher;

it’s about 7 out of 10.

I don’t need to tell anyone here

that there are a lot of dividing lines
in our society right now,

and a lot of them start to make sense,

once you understand people’s relationships
with these government numbers.

On the one hand, there are those who say
these statistics are crucial,

that we need them to make sense
of society as a whole

in order to move beyond
emotional anecdotes

and measure progress
in an [objective] way.

And then there are the others,

who say that these statistics are elitist,

maybe even rigged;

they don’t make sense
and they don’t really reflect

what’s happening
in people’s everyday lives.

It kind of feels like that second group
is winning the argument right now.

We’re living in a world
of alternative facts,

where people don’t find statistics
this kind of common ground,

this starting point for debate.

This is a problem.

There are actually
moves in the US right now

to get rid of some government
statistics altogether.

Right now there’s a bill in congress
about measuring racial inequality.

The draft law says that government
money should not be used

to collect data on racial segregation.

This is a total disaster.

If we don’t have this data,

how can we observe discrimination,

let alone fix it?

In other words:

How can a government create fair policies

if they can’t measure
current levels of unfairness?

This isn’t just about discrimination,

it’s everything – think about it.

How can we legislate on health care

if we don’t have good data
on health or poverty?

How can we have public debate
about immigration

if we can’t at least agree

on how many people are entering
and leaving the country?

Statistics come from the state;
that’s where they got their name.

The point was to better
measure the population

in order to better serve it.

So we need these government numbers,

but we also have to move
beyond either blindly accepting

or blindly rejecting them.

We need to learn the skills
to be able to spot bad statistics.

I started to learn some of these

when I was working
in a statistical department

that’s part of the United Nations.

Our job was to find out how many Iraqis
had been forced from their homes

as a result of the war,

and what they needed.

It was really important work,
but it was also incredibly difficult.

Every single day, we were making decisions

that affected the accuracy
of our numbers –

decisions like which parts
of the country we should go to,

who we should speak to,

which questions we should ask.

And I started to feel
really disillusioned with our work,

because we thought we were doing
a really good job,

but the one group of people
who could really tell us were the Iraqis,

and they rarely got the chance to find
our analysis, let alone question it.

So I started to feel really determined

that the one way to make
numbers more accurate

is to have as many people as possible
be able to question them.

So I became a data journalist.

My job is finding these data sets
and sharing them with the public.

Anyone can do this,
you don’t have to be a geek or a nerd.

You can ignore those words;
they’re used by people

trying to say they’re smart
while pretending they’re humble.

Absolutely anyone can do this.

I want to give you guys three questions

that will help you be able to spot
some bad statistics.

So, question number one
is: Can you see uncertainty?

One of things that’s really changed
people’s relationship with numbers,

and even their trust in the media,

has been the use of political polls.

I personally have a lot of issues
with political polls

because I think the role of journalists
is actually to report the facts

and not attempt to predict them,

especially when those predictions
can actually damage democracy

by signaling to people:
don’t bother to vote for that guy,

he doesn’t have a chance.

Let’s set that aside for now and talk
about the accuracy of this endeavor.

Based on national elections
in the UK, Italy, Israel

and of course, the most recent
US presidential election,

using polls to predict electoral outcomes

is about as accurate as using the moon
to predict hospital admissions.

No, seriously, I used actual data
from an academic study to draw this.

There are a lot of reasons why
polling has become so inaccurate.

Our societies have become really diverse,

which makes it difficult for pollsters
to get a really nice representative sample

of the population for their polls.

People are really reluctant to answer
their phones to pollsters,

and also, shockingly enough,
people might lie.

But you wouldn’t necessarily
know that to look at the media.

For one thing, the probability
of a Hillary Clinton win

was communicated with decimal places.

We don’t use decimal places
to describe the temperature.

How on earth can predicting the behavior
of 230 million voters in this country

be that precise?

And then there were those sleek charts.

See, a lot of data visualizations
will overstate certainty, and it works –

these charts can numb
our brains to criticism.

When you hear a statistic,
you might feel skeptical.

As soon as it’s buried in a chart,

it feels like some kind
of objective science,

and it’s not.

So I was trying to find ways
to better communicate this to people,

to show people the uncertainty
in our numbers.

What I did was I started taking
real data sets,

and turning them into
hand-drawn visualizations,

so that people can see
how imprecise the data is;

so people can see that a human did this,

a human found the data and visualized it.

For example, instead
of finding out the probability

of getting the flu in any given month,

you can see the rough
distribution of flu season.

This is –

(Laughter)

a bad shot to show in February.

But it’s also more responsible
data visualization,

because if you were to show
the exact probabilities,

maybe that would encourage
people to get their flu jabs

at the wrong time.

The point of these shaky lines

is so that people remember
these imprecisions,

but also so they don’t necessarily
walk away with a specific number,

but they can remember important facts.

Facts like injustice and inequality
leave a huge mark on our lives.

Facts like Black Americans and Native
Americans have shorter life expectancies

than those of other races,

and that isn’t changing anytime soon.

Facts like prisoners in the US
can be kept in solitary confinement cells

that are smaller than the size
of an average parking space.

The point of these visualizations
is also to remind people

of some really important
statistical concepts,

concepts like averages.

So let’s say you hear a claim like,

“The average swimming pool in the US
contains 6.23 fecal accidents.”

That doesn’t mean every single
swimming pool in the country

contains exactly 6.23 turds.

So in order to show that,

I went back to the original data,
which comes from the CDC,

who surveyed 47 swimming facilities.

And I just spent one evening
redistributing poop.

So you can kind of see
how misleading averages can be.

(Laughter)

OK, so the second question
that you guys should be asking yourselves

to spot bad numbers is:

Can I see myself in the data?

This question is also
about averages in a way,

because part of the reason
why people are so frustrated

with these national statistics,

is they don’t really tell the story
of who’s winning and who’s losing

from national policy.

It’s easy to understand why people
are frustrated with global averages

when they don’t match up
with their personal experiences.

I wanted to show people the way
data relates to their everyday lives.

I started this advice column
called “Dear Mona,”

where people would write to me
with questions and concerns

and I’d try to answer them with data.

People asked me anything.

questions like, “Is it normal to sleep
in a separate bed to my wife?”

“Do people regret their tattoos?”

“What does it mean to die
of natural causes?”

All of these questions are great,
because they make you think

about ways to find
and communicate these numbers.

If someone asks you,
“How much pee is a lot of pee?”

which is a question that I got asked,

you really want to make sure
that the visualization makes sense

to as many people as possible.

These numbers aren’t unavailable.

Sometimes they’re just buried
in the appendix of an academic study.

And they’re certainly not inscrutable;

if you really wanted to test
these numbers on urination volume,

you could grab a bottle
and try it for yourself.

(Laughter)

The point of this isn’t necessarily

that every single data set
has to relate specifically to you.

I’m interested in how many women
were issued fines in France

for wearing the face veil, or the niqab,

even if I don’t live in France
or wear the face veil.

The point of asking where you fit in
is to get as much context as possible.

So it’s about zooming out
from one data point,

like the unemployment rate
is five percent,

and seeing how it changes over time,

or seeing how it changes
by educational status –

this is why your parents always
wanted you to go to college –

or seeing how it varies by gender.

Nowadays, male unemployment rate is higher

than the female unemployment rate.

Up until the early ’80s,
it was the other way around.

This is a story of one
of the biggest changes

that’s happened in American society,

and it’s all there in that chart,
once you look beyond the averages.

The axes are everything;

once you change the scale,
you can change the story.

OK, so the third and final question
that I want you guys to think about

when you’re looking at statistics is:

How was the data collected?

So far, I’ve only talked about the way
data is communicated,

but the way it’s collected
matters just as much.

I know this is tough,

because methodologies can be opaque
and actually kind of boring,

but there are some simple steps
you can take to check this.

I’ll use one last example here.

One poll found that 41 percent of Muslims
in this country support jihad,

which is obviously pretty scary,

and it was reported everywhere in 2015.

When I want to check a number like that,

I’ll start off by finding
the original questionnaire.

It turns out that journalists
who reported on that statistic

ignored a question
lower down on the survey

that asked respondents
how they defined “jihad.”

And most of them defined it as,

“Muslims' personal, peaceful struggle
to be more religious.”

Only 16 percent defined it as,
“violent holy war against unbelievers.”

This is the really important point:

based on those numbers,
it’s totally possible

that no one in the survey
who defined it as violent holy war

also said they support it.

Those two groups might not overlap at all.

It’s also worth asking
how the survey was carried out.

This was something called an opt-in poll,

which means anyone could have found it
on the internet and completed it.

There’s no way of knowing
if those people even identified as Muslim.

And finally, there were 600
respondents in that poll.

There are roughly three million
Muslims in this country,

according to Pew Research Center.

That means the poll spoke to roughly
one in every 5,000 Muslims

in this country.

This is one of the reasons

why government statistics
are often better than private statistics.

A poll might speak to a couple
hundred people, maybe a thousand,

or if you’re L’Oreal, trying to sell
skin care products in 2005,

then you spoke to 48 women
to claim that they work.

(Laughter)

Private companies don’t have a huge
interest in getting the numbers right,

they just need the right numbers.

Government statisticians aren’t like that.

In theory, at least,
they’re totally impartial,

not least because most of them do
their jobs regardless of who’s in power.

They’re civil servants.

And to do their jobs properly,

they don’t just speak
to a couple hundred people.

Those unemployment numbers
I keep on referencing

come from the Bureau of Labor Statistics,

and to make their estimates,

they speak to over 140,000
businesses in this country.

I get it, it’s frustrating.

If you want to test a statistic
that comes from a private company,

you can buy the face cream for you
and a bunch of friends, test it out,

if it doesn’t work,
you can say the numbers were wrong.

But how do you question
government statistics?

You just keep checking everything.

Find out how they collected the numbers.

Find out if you’re seeing everything
on the chart you need to see.

But don’t give up on the numbers
altogether, because if you do,

we’ll be making public policy
decisions in the dark,

using nothing but private
interests to guide us.

Thank you.

(Applause)