The human insights missing from big data Tricia Wang

In ancient Greece,

when anyone from slaves to soldiers,
poets and politicians,

needed to make a big decision
on life’s most important questions,

like, “Should I get married?”

or “Should we embark on this voyage?”

or “Should our army
advance into this territory?”

they all consulted the oracle.

So this is how it worked:

you would bring her a question
and you would get on your knees,

and then she would go into this trance.

It would take a couple of days,

and then eventually
she would come out of it,

giving you her predictions as your answer.

From the oracle bones of ancient China

to ancient Greece to Mayan calendars,

people have craved for prophecy

in order to find out
what’s going to happen next.

And that’s because we all want
to make the right decision.

We don’t want to miss something.

The future is scary,

so it’s much nicer
knowing that we can make a decision

with some assurance of the outcome.

Well, we have a new oracle,

and it’s name is big data,

or we call it “Watson”
or “deep learning” or “neural net.”

And these are the kinds of questions
we ask of our oracle now,

like, “What’s the most efficient way
to ship these phones

from China to Sweden?”

Or, “What are the odds

of my child being born
with a genetic disorder?”

Or, “What are the sales volume
we can predict for this product?”

I have a dog. Her name is Elle,
and she hates the rain.

And I have tried everything
to untrain her.

But because I have failed at this,

I also have to consult
an oracle, called Dark Sky,

every time before we go on a walk,

for very accurate weather predictions
in the next 10 minutes.

She’s so sweet.

So because of all of this,
our oracle is a $122 billion industry.

Now, despite the size of this industry,

the returns are surprisingly low.

Investing in big data is easy,

but using it is hard.

Over 73 percent of big data projects
aren’t even profitable,

and I have executives
coming up to me saying,

“We’re experiencing the same thing.

We invested in some big data system,

and our employees aren’t making
better decisions.

And they’re certainly not coming up
with more breakthrough ideas.”

So this is all really interesting to me,

because I’m a technology ethnographer.

I study and I advise companies

on the patterns
of how people use technology,

and one of my interest areas is data.

So why is having more data
not helping us make better decisions,

especially for companies
who have all these resources

to invest in these big data systems?

Why isn’t it getting any easier for them?

So, I’ve witnessed the struggle firsthand.

In 2009, I started
a research position with Nokia.

And at the time,

Nokia was one of the largest
cell phone companies in the world,

dominating emerging markets
like China, Mexico and India –

all places where I had done
a lot of research

on how low-income people use technology.

And I spent a lot of extra time in China

getting to know the informal economy.

So I did things like working
as a street vendor

selling dumplings to construction workers.

Or I did fieldwork,

spending nights and days
in internet cafés,

hanging out with Chinese youth,
so I could understand

how they were using
games and mobile phones

and using it between moving
from the rural areas to the cities.

Through all of this qualitative evidence
that I was gathering,

I was starting to see so clearly

that a big change was about to happen
among low-income Chinese people.

Even though they were surrounded
by advertisements for luxury products

like fancy toilets –
who wouldn’t want one? –

and apartments and cars,

through my conversations with them,

I found out that the ads
the actually enticed them the most

were the ones for iPhones,

promising them this entry
into this high-tech life.

And even when I was living with them
in urban slums like this one,

I saw people investing
over half of their monthly income

into buying a phone,

and increasingly, they were “shanzhai,”

which are affordable knock-offs
of iPhones and other brands.

They’re very usable.

Does the job.

And after years of living
with migrants and working with them

and just really doing everything
that they were doing,

I started piecing
all these data points together –

from the things that seem random,
like me selling dumplings,

to the things that were more obvious,

like tracking how much they were spending
on their cell phone bills.

And I was able to create
this much more holistic picture

of what was happening.

And that’s when I started to realize

that even the poorest in China
would want a smartphone,

and that they would do almost anything
to get their hands on one.

You have to keep in mind,

iPhones had just come out, it was 2009,

so this was, like, eight years ago,

and Androids had just started
looking like iPhones.

And a lot of very smart
and realistic people said,

“Those smartphones – that’s just a fad.

Who wants to carry around
these heavy things

where batteries drain quickly
and they break every time you drop them?”

But I had a lot of data,

and I was very confident
about my insights,

so I was very excited
to share them with Nokia.

But Nokia was not convinced,

because it wasn’t big data.

They said, “We have
millions of data points,

and we don’t see any indicators
of anyone wanting to buy a smartphone,

and your data set of 100,
as diverse as it is, is too weak

for us to even take seriously.”

And I said, “Nokia, you’re right.

Of course you wouldn’t see this,

because you’re sending out surveys
assuming that people don’t know

what a smartphone is,

so of course you’re not going
to get any data back

about people wanting to buy
a smartphone in two years.

Your surveys, your methods
have been designed

to optimize an existing business model,

and I’m looking
at these emergent human dynamics

that haven’t happened yet.

We’re looking outside of market dynamics

so that we can get ahead of it.”

Well, you know what happened to Nokia?

Their business fell off a cliff.

This – this is the cost
of missing something.

It was unfathomable.

But Nokia’s not alone.

I see organizations
throwing out data all the time

because it didn’t come from a quant model

or it doesn’t fit in one.

But it’s not big data’s fault.

It’s the way we use big data;
it’s our responsibility.

Big data’s reputation for success

comes from quantifying
very specific environments,

like electricity power grids
or delivery logistics or genetic code,

when we’re quantifying in systems
that are more or less contained.

But not all systems
are as neatly contained.

When you’re quantifying
and systems are more dynamic,

especially systems
that involve human beings,

forces are complex and unpredictable,

and these are things
that we don’t know how to model so well.

Once you predict something
about human behavior,

new factors emerge,

because conditions
are constantly changing.

That’s why it’s a never-ending cycle.

You think you know something,

and then something unknown
enters the picture.

And that’s why just relying
on big data alone

increases the chance
that we’ll miss something,

while giving us this illusion
that we already know everything.

And what makes it really hard
to see this paradox

and even wrap our brains around it

is that we have this thing
that I call the quantification bias,

which is the unconscious belief
of valuing the measurable

over the immeasurable.

And we often experience this at our work.

Maybe we work alongside
colleagues who are like this,

or even our whole entire
company may be like this,

where people become
so fixated on that number,

that they can’t see anything
outside of it,

even when you present them evidence
right in front of their face.

And this is a very appealing message,

because there’s nothing
wrong with quantifying;

it’s actually very satisfying.

I get a great sense of comfort
from looking at an Excel spreadsheet,

even very simple ones.

(Laughter)

It’s just kind of like,

“Yes! The formula worked. It’s all OK.
Everything is under control.”

But the problem is

that quantifying is addictive.

And when we forget that

and when we don’t have something
to kind of keep that in check,

it’s very easy to just throw out data

because it can’t be expressed
as a numerical value.

It’s very easy just to slip
into silver-bullet thinking,

as if some simple solution existed.

Because this is a great moment of danger
for any organization,

because oftentimes,
the future we need to predict –

it isn’t in that haystack,

but it’s that tornado
that’s bearing down on us

outside of the barn.

There is no greater risk

than being blind to the unknown.

It can cause you to make
the wrong decisions.

It can cause you to miss something big.

But we don’t have to go down this path.

It turns out that the oracle
of ancient Greece

holds the secret key
that shows us the path forward.

Now, recent geological research has shown

that the Temple of Apollo,
where the most famous oracle sat,

was actually built
over two earthquake faults.

And these faults would release
these petrochemical fumes

from underneath the Earth’s crust,

and the oracle literally sat
right above these faults,

inhaling enormous amounts
of ethylene gas, these fissures.

(Laughter)

It’s true.

(Laughter)

It’s all true, and that’s what made her
babble and hallucinate

and go into this trance-like state.

She was high as a kite!

(Laughter)

So how did anyone –

How did anyone get
any useful advice out of her

in this state?

Well, you see those people
surrounding the oracle?

You see those people holding her up,

because she’s, like, a little woozy?

And you see that guy
on your left-hand side

holding the orange notebook?

Well, those were the temple guides,

and they worked hand in hand
with the oracle.

When inquisitors would come
and get on their knees,

that’s when the temple guides
would get to work,

because after they asked her questions,

they would observe their emotional state,

and then they would ask them
follow-up questions,

like, “Why do you want to know
this prophecy? Who are you?

What are you going to do
with this information?”

And then the temple guides would take
this more ethnographic,

this more qualitative information,

and interpret the oracle’s babblings.

So the oracle didn’t stand alone,

and neither should our big data systems.

Now to be clear,

I’m not saying that big data systems
are huffing ethylene gas,

or that they’re even giving
invalid predictions.

The total opposite.

But what I am saying

is that in the same way
that the oracle needed her temple guides,

our big data systems need them, too.

They need people like ethnographers
and user researchers

who can gather what I call thick data.

This is precious data from humans,

like stories, emotions and interactions
that cannot be quantified.

It’s the kind of data
that I collected for Nokia

that comes in in the form
of a very small sample size,

but delivers incredible depth of meaning.

And what makes it so thick and meaty

is the experience of understanding
the human narrative.

And that’s what helps to see
what’s missing in our models.

Thick data grounds our business questions
in human questions,

and that’s why integrating
big and thick data

forms a more complete picture.

Big data is able to offer
insights at scale

and leverage the best
of machine intelligence,

whereas thick data can help us
rescue the context loss

that comes from making big data usable,

and leverage the best
of human intelligence.

And when you actually integrate the two,
that’s when things get really fun,

because then you’re no longer
just working with data

you’ve already collected.

You get to also work with data
that hasn’t been collected.

You get to ask questions about why:

Why is this happening?

Now, when Netflix did this,

they unlocked a whole new way
to transform their business.

Netflix is known for their really great
recommendation algorithm,

and they had this $1 million prize
for anyone who could improve it.

And there were winners.

But Netflix discovered
the improvements were only incremental.

So to really find out what was going on,

they hired an ethnographer,
Grant McCracken,

to gather thick data insights.

And what he discovered was something
that they hadn’t seen initially

in the quantitative data.

He discovered that people loved
to binge-watch.

In fact, people didn’t even
feel guilty about it.

They enjoyed it.

(Laughter)

So Netflix was like,
“Oh. This is a new insight.”

So they went to their data science team,

and they were able to scale
this big data insight

in with their quantitative data.

And once they verified it
and validated it,

Netflix decided to do something
very simple but impactful.

They said, instead of offering
the same show from different genres

or more of the different shows
from similar users,

we’ll just offer more of the same show.

We’ll make it easier
for you to binge-watch.

And they didn’t stop there.

They did all these things

to redesign their entire
viewer experience,

to really encourage binge-watching.

It’s why people and friends disappear
for whole weekends at a time,

catching up on shows
like “Master of None.”

By integrating big data and thick data,
they not only improved their business,

but they transformed how we consume media.

And now their stocks are projected
to double in the next few years.

But this isn’t just about
watching more videos

or selling more smartphones.

For some, integrating thick data
insights into the algorithm

could mean life or death,

especially for the marginalized.

All around the country,
police departments are using big data

for predictive policing,

to set bond amounts
and sentencing recommendations

in ways that reinforce existing biases.

NSA’s Skynet machine learning algorithm

has possibly aided in the deaths
of thousands of civilians in Pakistan

from misreading cellular device metadata.

As all of our lives become more automated,

from automobiles to health insurance
or to employment,

it is likely that all of us

will be impacted
by the quantification bias.

Now, the good news
is that we’ve come a long way

from huffing ethylene gas
to make predictions.

We have better tools,
so let’s just use them better.

Let’s integrate the big data
with the thick data.

Let’s bring our temple guides
with the oracles,

and whether this work happens
in companies or nonprofits

or government or even in the software,

all of it matters,

because that means
we’re collectively committed

to making better data,

better algorithms, better outputs

and better decisions.

This is how we’ll avoid
missing that something.

(Applause)

在古希腊,

从奴隶到士兵、诗人和政治家,任何人都

需要
在生活中最重要的问题上做出重大决定,

比如“我应该结婚吗?”

或“我们应该开始这次航行吗?”

或者“我们的军队应该
进入这片领土吗?”

他们都请教了神谕。

所以这就是它的工作原理:

你会给她一个问题,
然后你会跪下,

然后她会进入这种恍惚状态。

这需要几天的时间,

然后
她最终会走出困境,

给你她的预测作为你的答案。

从古代中国的甲骨文

到古希腊再到玛雅历法,

人们渴望预言

,以了解
接下来会发生什么。

那是因为我们
都想做出正确的决定。

我们不想错过任何东西。

未来是可怕的,

所以
知道我们可以

在对结果有一定保证的情况下做出决定会更好。

嗯,我们有一个新的预言机

,它的名字叫大数据,

或者我们称之为“Watson”
或“深度学习”或“神经网络”。

这些是
我们现在向我们的预言机提出的各种问题,

例如,“
将这些手机

从中国运送到瑞典的最有效方式是什么?”

或者,“

我的孩子出生时
患有遗传疾病的几率有多大?”

或者,“
我们可以预测这个产品的销量是多少?”

我养了一条狗。 她的名字叫艾丽
,她讨厌下雨。

我已经尝试了一切
来解除对她的训练。

但是因为我在这方面失败了,

所以每次我们去散步之前,我还必须咨询
一个名为 Dark Sky 的神谕,

以获得未来 10 分钟内非常准确的天气预报。

她好甜

因此,正因为如此,
我们的预言机是一个价值 1220 亿美元的产业。

现在,尽管这个行业规模庞大

,但回报却低得惊人。

投资大数据很容易,

但使用它却很难。

超过 73% 的大数据
项目甚至没有盈利

,我有高管
来找我说,

“我们正在经历同样的事情。

我们投资了一些大数据系统,但

我们的员工并没有做出
更好的决策。

他们肯定不会
想出更多突破性的想法。”

所以这对我来说真的很有趣,

因为我是一名技术民族志学家。

我研究并就

人们如何使用技术的模式向公司提供建议

,我感兴趣的领域之一是数据。

那么,为什么拥有更多数据并
不能帮助我们做出更好的决策,

尤其是对于
拥有所有这些资源

来投资这些大数据系统的公司而言呢?

为什么对他们来说没有变得更容易?

所以,我亲眼目睹了这场斗争。

2009 年,我开始
在诺基亚担任研究职位。

当时,

诺基亚是世界上最大的
手机公司之一,

主导
着中国、墨西哥和印度等新兴市场——

我在这些地方

对低收入人群如何使用技术进行了大量研究。

我花了很多额外的时间在中国

了解非正规经济。

所以我做了一些
像街头小

贩卖饺子给建筑工人这样的事情。

或者我做实地考察,

夜以继日地
在网吧里度过,

和中国年轻人一起出去玩,
这样我就可以

了解他们是如何使用
游戏和手机的,

以及在
从农村地区到城市之间使用它的情况。

通过我收集的所有这些定性证据

我开始清楚地看到


低收入的中国人即将发生巨大的变化。

即使他们被
高档马桶等奢侈品广告所包围

——
谁不想要呢? ——

还有公寓和汽车,

通过与他们的交谈,

我发现
实际上最能吸引他们的广告

是 iPhone 的广告,

承诺他们将
进入这种高科技生活。

甚至当我和他们一起住
在像这样的城市贫民窟时,

我看到人们把
一半以上的月收入都花在

了买手机上,

而且越来越多的人是“山寨”

,这
是 iPhone 和其他品牌的廉价仿冒品 .

它们非常有用。

做这项工作。

经过多年
与移民一起生活并与他们一起工作

并真的做
了他们所做的一切,

我开始将
所有这些数据点拼凑在一起——

从看起来随机的事情,
比如我卖饺子,

到更明显的事情 ,

比如跟踪他们在手机账单上花了多少钱

我能够对正在发生的事情
进行更全面的描绘

就在那时,我开始意识到

,即使是中国最贫穷的人
也想要一部智能手机,

而且他们几乎可以不惜一切代价
拿到智能手机。

你必须记住,

iPhone 刚刚问世,那是 2009 年,

所以这就像八年前

,Android 刚刚开始
看起来像 iPhone。

许多非常聪明
和现实的人说,

“那些智能手机——那只是一种时尚。

谁愿意随身携带
这些沉重的东西

,电池很快
就会耗尽,每次掉下来都会坏掉?”

但是我有很多数据,

而且我
对自己的见解非常有信心,

所以我很高兴
能与诺基亚分享它们。

但诺基亚并不相信,

因为它不是大数据。

他们说:“我们
有数百万个数据点

,我们没有看到任何
人想购买智能手机的迹象

,而且你的 100 个数据集
虽然多种多样,但

对于我们来说太弱了,甚至无法认真对待。 "

我说,“诺基亚,你是对的。

你当然不会看到这个,

因为你在发送调查时
假设人们不

知道智能手机是什么,

所以你当然
不会得到

两年内人们想要购买智能手机的任何数据。

您的调查,您的方法

旨在优化现有的商业模式

,我正在研究

这些尚未发生的新兴人类动态。

我们正在寻找 市场动态之外,

以便我们能够领先于它。”

那么,你知道诺基亚发生了什么吗?

他们的生意一落千丈。

这——这
是失去某些东西的代价。

这是深不可测的。

但诺基亚并不孤单。

我看到组织一直在
丢弃数据,

因为它不是来自量化模型

或者它不适合一个模型。

但这不是大数据的错。

这是我们使用大数据的方式;
这是我们的责任。 当我们在或多或少包含在内的系统中进行量化时,

大数据的成功声誉

来自于对
非常具体的环境进行量化,

例如电网
、物流或遗传密码

但并非所有系统
都被整齐地包含在内。

当您进行量化
并且系统更加动态时,

尤其
是涉及人类的系统时,

力是复杂且不可预测的,

而这些
是我们不知道如何很好地建模的东西。

一旦你预测了一些
关于人类行为的事情,

新的因素就会出现,

因为条件
在不断变化。

这就是为什么它是一个永无止境的循环。

你认为你知道一些东西,

然后一些未知
的东西进入了画面。

这就是为什么
仅仅依靠大数据会

增加
我们错过某些东西的机会,

同时给我们
一种我们已经知道一切的错觉。

让我们
很难看到这个悖论

,甚至把我们的大脑包裹起来,

是因为我们有
一种我称之为量化偏差的东西,

这是一种无意识的信念
,即重视可测量的事物而不是不可测量的事物

我们在工作中经常会遇到这种情况。

也许我们和这样的同事一起工作

甚至我们整个
公司都可能是这样,

人们
对这个数字如此着迷,

以至于他们看不到
它之外的任何东西,

即使你把
证据摆在他们面前 他们的脸。

这是一个非常吸引人的信息,

因为
量化没有错;

它实际上非常令人满意。

看着 Excel 电子表格,

即使是非常简单的电子表格,我也感到非常舒服。

(笑声

) 就像,

“是的!公式有效。一切都好。
一切都在控制之中。”

但问题

是量化会让人上瘾。

当我们忘记了这一点,

并且当我们没有东西
可以控制

它时,很容易丢弃数据,

因为它不能表示
为数值。

很容易
陷入银弹思维,

好像存在一些简单的解决方案。

因为对于任何组织来说,这都是一个非常危险的时刻,因为很多时候,

我们需要预测的未来——

它不在大海捞针中,

而是

在谷仓外向我们袭来的龙卷风。

没有

比对未知事物视而不见更大的风险了。

它可能会导致您
做出错误的决定。

它可能会导致你错过一些重要的事情。

但我们不必走这条路。

事实证明,古希腊的神谕

掌握着
向我们展示前进道路的秘密钥匙。

现在,最近的地质研究

表明,
最著名的神谕所在的阿波罗

神庙实际上是建
在两个地震断层上的。

这些断层会从地壳下面释放出
这些石油化学烟雾

而神谕
就在这些断层的正上方,

吸入了大量
的乙烯气体,这些裂缝。

(笑声)

这是真的。

(笑声)

这一切都是真的,这就是让她
喋喋不休和产生幻觉

并进入这种恍惚状态的原因。

她高得像一只风筝!

(笑声)

那么,在这种情况下,任何人是

如何从她那里得到
任何有用的建议的

呢?

那么,你看到那些
围绕着神谕的人了吗?

你看到那些人扶着她,

因为她有点头晕?

你看到
你左边那个

拿着橙色笔记本的人了吗?

嗯,那些是寺庙向导

,他们
与神谕携手合作。

当审判官
过来跪下

的时候,那是寺庙
向导开始工作的时候,

因为他们问了她问题之后,

他们会观察他们的情绪状态,

然后他们会问他们
后续的问题,

比如,“为什么 “你想知道
这个预言?你是谁?

你打算
用这些信息做什么?”

然后寺庙的向导会利用
这些更民族志

、更定性的信息,

并解释神谕的咿呀学语。

所以预言机并不孤单

,我们的大数据系统也不应该孤军奋战。

现在要明确一点,

我并不是说大数据系统
正在大量使用乙烯气体,

或者它们甚至给出了
无效的预测。

完全相反。

但我要说的

是,
就像神谕需要她的寺庙指南一样,

我们的大数据系统也需要它们。

他们需要像民族志学家
和用户研究

人员这样可以收集我所说的厚数据的人。

这是来自人类的宝贵数据,

例如无法量化的故事、情感和
互动。


是我为诺基亚收集的那种数据,

样本量非常小,

但却提供了令人难以置信的深度意义。

使它如此厚实和

丰富的是
理解人类叙事的经验。

这有助于了解
我们的模型中缺少什么。

厚数据将我们的业务问题建立
在人类问题的基础上

,这就是整合
大数据和厚数据

形成更完整图景的原因。

大数据能够
提供大规模的洞察力

并利用最好
的机器智能,

而厚数据可以帮助我们
挽救

因使大数据可用

而导致的上下文丢失,并利用最好
的人类智能。

当你真正将两者结合起来
时,事情就会变得非常有趣,

因为你不再
只是在处理

已经收集的数据。

您还可以处理
尚未收集的数据。

你会问

为什么会这样:为什么会这样?

现在,当 Netflix 这样做时,

他们开启了一种全新
的业务转型方式。

Netflix 以其非常出色的
推荐算法

而闻名,他们
为任何可以改进它的人提供了 100 万美元的奖金。

并且有赢家。

但 Netflix 发现
这些改进只是渐进式的。

因此,为了真正了解发生了什么,

他们聘请了人种学家
格兰特·麦克拉肯 (Grant McCracken)

来收集丰富的数据洞察力。

他发现的
是他们最初

在定量数据中没有看到的东西。

他发现人们
喜欢狂欢。

事实上,人们甚至没有为此
感到内疚。

他们很享受。

(笑声)

所以 Netflix 就像,
“哦。这是一个新的见解。”

所以他们去了他们的数据科学团队

,他们能够

利用他们的定量数据来扩展这种大数据洞察力。

一旦他们验证
并验证了它,

Netflix 决定做一些
非常简单但有影响力的事情。

他们说,我们不会提供
来自不同类型的相同节目


来自相似用户的更多不同节目,

我们只会提供更多相同的节目。

我们会让您更轻松
地观看狂欢。

他们并没有就此止步。

他们做了所有这些事情

来重新设计他们的整个
观众体验

,真正鼓励狂欢。

这就是为什么人们和朋友一次
消失整个周末,

赶上
“无主之地”之类的节目。

通过整合大数据和厚数据,
他们不仅改善了业务,

而且改变了我们消费媒体的方式。

现在,他们的股票预计
在未来几年内将翻一番。

但这不仅仅是
观看更多视频

或销售更多智能手机。

对于一些人来说,将厚数据
洞察力整合到算法中

可能意味着生死攸关,

尤其是对边缘化群体而言。

在全国各地,
警察部门都在使用大数据

进行预测性警务,

以强化现有偏见的方式设定保证金金额
和量刑建议

NSA 的天网机器学习

算法可能帮助
巴基斯坦数千名平民

因误读蜂窝设备元数据而死亡。

随着我们所有的生活变得更加自动化,

从汽车到健康保险
或就业,

我们所有人都可能会

受到量化偏差的影响。

现在,好消息
是,我们已经

从吸入乙烯气体
到做出预测已经走了很长一段路。

我们有更好的工具,
所以让我们更好地使用它们。

让我们将大数据
与厚数据整合起来。

让我们带着神谕带着我们的神殿指南

,无论这项工作发生
在公司、非营利组织

、政府还是软件中,

所有这一切都很重要,

因为这意味着
我们共同

致力于创造更好的数据、

更好的算法、更好的输出

和 更好的决定。

这就是我们如何避免
错过某些东西的方法。

(掌声)