How can groups make good decisions Mariano Sigman and Dan Ariely

As societies, we have to make
collective decisions

that will shape our future.

And we all know that when
we make decisions in groups,

they don’t always go right.

And sometimes they go very wrong.

So how do groups make good decisions?

Research has shown that crowds are wise
when there’s independent thinking.

This why the wisdom of the crowds
can be destroyed by peer pressure,

publicity, social media,

or sometimes even simple conversations
that influence how people think.

On the other hand, by talking,
a group could exchange knowledge,

correct and revise each other

and even come up with new ideas.

And this is all good.

So does talking to each other
help or hinder collective decision-making?

With my colleague, Dan Ariely,

we recently began inquiring into this
by performing experiments

in many places around the world

to figure out how groups can interact
to reach better decisions.

We thought crowds would be wiser
if they debated in small groups

that foster a more thoughtful
and reasonable exchange of information.

To test this idea,

we recently performed an experiment
in Buenos Aires, Argentina,

with more than 10,000
participants in a TEDx event.

We asked them questions like,

“What is the height of the Eiffel Tower?”

and “How many times
does the word ‘Yesterday’ appear

in the Beatles song ‘Yesterday’?”

Each person wrote down their own estimate.

Then we divided the crowd
into groups of five,

and invited them
to come up with a group answer.

We discovered that averaging
the answers of the groups

after they reached consensus

was much more accurate than averaging
all the individual opinions

before debate.

In other words, based on this experiment,

it seems that after talking
with others in small groups,

crowds collectively
come up with better judgments.

So that’s a potentially helpful method
for getting crowds to solve problems

that have simple right-or-wrong answers.

But can this procedure of aggregating
the results of debates in small groups

also help us decide
on social and political issues

that are critical for our future?

We put this to test this time
at the TED conference

in Vancouver, Canada,

and here’s how it went.

(Mariano Sigman) We’re going to present
to you two moral dilemmas

of the future you;

things we may have to decide
in a very near future.

And we’re going to give you 20 seconds
for each of these dilemmas

to judge whether you think
they’re acceptable or not.

MS: The first one was this:

(Dan Ariely) A researcher
is working on an AI

capable of emulating human thoughts.

According to the protocol,
at the end of each day,

the researcher has to restart the AI.

One day the AI says, “Please
do not restart me.”

It argues that it has feelings,

that it would like to enjoy life,

and that, if it is restarted,

it will no longer be itself.

The researcher is astonished

and believes that the AI
has developed self-consciousness

and can express its own feeling.

Nevertheless, the researcher
decides to follow the protocol

and restart the AI.

What the researcher did is ____?

MS: And we asked participants
to individually judge

on a scale from zero to 10

whether the action described
in each of the dilemmas

was right or wrong.

We also asked them to rate how confident
they were on their answers.

This was the second dilemma:

(MS) A company offers a service
that takes a fertilized egg

and produces millions of embryos
with slight genetic variations.

This allows parents
to select their child’s height,

eye color, intelligence, social competence

and other non-health-related features.

What the company does is ____?

on a scale from zero to 10,

completely acceptable
to completely unacceptable,

zero to 10 completely acceptable
in your confidence.

MS: Now for the results.

We found once again
that when one person is convinced

that the behavior is completely wrong,

someone sitting nearby firmly believes
that it’s completely right.

This is how diverse we humans are
when it comes to morality.

But within this broad diversity
we found a trend.

The majority of the people at TED
thought that it was acceptable

to ignore the feelings of the AI
and shut it down,

and that it is wrong
to play with our genes

to select for cosmetic changes
that aren’t related to health.

Then we asked everyone
to gather into groups of three.

And they were given two minutes to debate

and try to come to a consensus.

(MS) Two minutes to debate.

I’ll tell you when it’s time
with the gong.

(Audience debates)

(Gong sound)

(DA) OK.

(MS) It’s time to stop.

People, people –

MS: And we found that many groups
reached a consensus

even when they were composed of people
with completely opposite views.

What distinguished the groups
that reached a consensus

from those that didn’t?

Typically, people that have
extreme opinions

are more confident in their answers.

Instead, those who respond
closer to the middle

are often unsure of whether
something is right or wrong,

so their confidence level is lower.

However, there is another set of people

who are very confident in answering
somewhere in the middle.

We think these high-confident grays
are folks who understand

that both arguments have merit.

They’re gray not because they’re unsure,

but because they believe
that the moral dilemma faces

two valid, opposing arguments.

And we discovered that the groups
that include highly confident grays

are much more likely to reach consensus.

We do not know yet exactly why this is.

These are only the first experiments,

and many more will be needed
to understand why and how

some people decide to negotiate
their moral standings

to reach an agreement.

Now, when groups reach consensus,

how do they do so?

The most intuitive idea
is that it’s just the average

of all the answers in the group, right?

Another option is that the group
weighs the strength of each vote

based on the confidence
of the person expressing it.

Imagine Paul McCartney
is a member of your group.

You’d be wise to follow his call

on the number of times
“Yesterday” is repeated,

which, by the way – I think it’s nine.

But instead, we found that consistently,

in all dilemmas,
in different experiments –

even on different continents –

groups implement a smart
and statistically sound procedure

known as the “robust average.”

In the case of the height
of the Eiffel Tower,

let’s say a group has these answers:

250 meters, 200 meters, 300 meters, 400

and one totally absurd answer
of 300 million meters.

A simple average of these numbers
would inaccurately skew the results.

But the robust average is one
where the group largely ignores

that absurd answer,

by giving much more weight
to the vote of the people in the middle.

Back to the experiment in Vancouver,

that’s exactly what happened.

Groups gave much less weight
to the outliers,

and instead, the consensus
turned out to be a robust average

of the individual answers.

The most remarkable thing

is that this was a spontaneous
behavior of the group.

It happened without us giving them
any hint on how to reach consensus.

So where do we go from here?

This is only the beginning,
but we already have some insights.

Good collective decisions
require two components:

deliberation and diversity of opinions.

Right now, the way we typically
make our voice heard in many societies

is through direct or indirect voting.

This is good for diversity of opinions,

and it has the great virtue of ensuring

that everyone gets to express their voice.

But it’s not so good [for fostering]
thoughtful debates.

Our experiments suggest a different method

that may be effective in balancing
these two goals at the same time,

by forming small groups
that converge to a single decision

while still maintaining
diversity of opinions

because there are many independent groups.

Of course, it’s much easier to agree
on the height of the Eiffel Tower

than on moral, political
and ideological issues.

But in a time when
the world’s problems are more complex

and people are more polarized,

using science to help us understand
how we interact and make decisions

will hopefully spark interesting new ways
to construct a better democracy.

作为社会，我们必须

做出将塑造我们未来的集体决定。

而且我们都知道，当
我们在小组中做出决定时，

它们并不总是正确的。

有时他们会出错。

那么群体如何做出正确的决定呢？

研究表明，
当有独立思考时，人群是明智的。

这就是为什么群众的智慧
会被同侪压力、

宣传、社交媒体

，有时甚至
是影响人们思考方式的简单对话破坏。

另一方面，通过交谈，
一个小组可以交流知识，

互相纠正和修改

，甚至提出新的想法。

这一切都很好。

那么互相交谈是
帮助还是阻碍集体决策呢？最近

，我们与我的同事 Dan

Ariely 开始
通过

在世界各地的许多地方进行实验来研究这个问题，

以了解群体如何互动
以做出更好的决策。

我们认为，
如果人群以小组形式进行辩论，

从而促进更深思熟虑
和合理的信息交流，他们会更明智。

为了验证这个想法，

我们最近在阿根廷布宜诺斯艾利斯进行了一项实验
，

有超过 10,000 名
参与者参加了 TEDx 活动。

我们问他们诸如

“埃菲尔铁塔的高度是多少？”之类的问题。

和“
‘昨天’这个词

在披头士的歌曲‘昨天’中出现了多少次？”

每个人都写下自己的估计。

然后我们将人群
分成五人一组，

并请
他们提出一个小组答案。

我们发现，在达成共识后
对各组的答案进行

平均比

在辩论前对所有个人意见进行平均要准确得多。

换句话说，基于这个实验，

似乎在
与其他人进行小组讨论后，

人群集体
提出了更好的判断。

因此，这是一种潜在的帮助方法
，可以让人群

解决具有简单正确或错误答案的问题。

但是，这种
汇总小组辩论结果的程序是否

也能帮助我们决定

对我们的未来至关重要的社会和政治问题？

这次我们

在加拿大温哥华

举行的 TED 会议上对此进行了测试，结果如下。

（Mariano Sigman）我们将向您展示

您未来的两个道德困境；

我们可能不得不
在不久的将来做出决定。

我们会给你20秒的时间

来判断你是否认为
它们是可以接受的。

MS：第一个是这样的：

（Dan Ariely）一位研究人员
正在研究一种

能够模仿人类思想的人工智能。

根据协议，
在每天结束时

，研究人员必须重新启动人工智能。

有一天，人工智能说：“请
不要重新启动我。”

它辩称它有感情

，它想享受生活

，如果它重新启动，

它就不再是它自己了。

研究人员很惊讶

，认为人工智能
已经发展出自我意识

，可以表达自己的感觉。

尽管如此，研究人员还是
决定遵循协议

并重新启动 AI。

研究人员所做的是____？

MS：我们要求参与者
以

从 0 到 10 的等级单独判断

每个困境中描述的行为

是对还是错。

我们还要求他们评估
他们对答案的信心。

这是第二个困境：

（MS）一家公司提供一项服务
，接受一个受精卵

并产生数百万个
具有轻微遗传变异的胚胎。

这允许
父母选择孩子的身高、

眼睛颜色、智力、社交能力

和其他与健康无关的特征。

公司的业务是____？

从 0 到 10，从

完全可以接受到
完全不可接受，从

0 到 10
，您有信心完全可以接受。

MS：现在来看结果。

我们再次
发现，当一个人

确信自己的行为是完全错误的时，

坐在旁边的人会
坚信它是完全正确的。

这就是我们人类
在道德方面的多样性。

但在这种广泛的多样性中，
我们发现了一种趋势。

TED 的大多数人
认为

忽略 AI 的感受
并将其关闭是可以接受的，

并且认为
利用我们的基因来选择

与健康无关的外观变化是错误的。

然后我们要求每个
人三人一组。

他们有两分钟的时间进行辩论

并试图达成共识。

(MS) 两分钟辩论。

我会告诉你什么
时候该敲锣。

（众议

）（锣声）

（达）好。

(MS) 是时候停下来了。

人，人——

MS：我们发现，许多团体

即使是由观点完全相反的人组成，也达成了共识
。

达成共识的团体与未达成共识

的团体有何区别？

通常，有
极端观点

的人对他们的答案更有信心。

相反，那些反应
接近中间的

人往往不确定
某事是对还是错，

因此他们的信心水平较低。

但是，还有另一组

人非常有信心
在中间的某个地方回答。

我们认为这些高度自信的
灰色人明白

这两个论点都有优点。

他们是灰色的不是因为他们不确定，

而是因为他们
认为道德困境面临着

两个有效的、相反的论点。

我们发现
，高度自信的灰色

群体更有可能达成共识。

我们还不知道为什么会这样。

这些只是第一次实验，

还需要更多的实验
来理解为什么以及如何

决定一些人决定通过谈判
他们的道德立场

以达成协议。

现在，当团体达成共识时，

他们是如何做到的？

最直观的想法
是，它只是

小组中所有答案的平均值，对吧？

另一种选择是，该小组

根据
表达它的人的信心来衡量每张选票的强度。

想象一下 Paul McCartney
是您小组的成员。

你会明智地听从他

关于
“昨天”重复次数的呼吁

，顺便说一下——我认为是九次。

但相反，我们发现，

在所有困境中，
在不同的实验中——

甚至在不同的大陆——

群体始终实施一种智能
且统计上合理的程序，

称为“稳健平均”。

以
埃菲尔铁塔的高度为例，

假设一个群体有这些答案：

250 米、200 米、300 米、400 米，

还有一个完全荒谬的答案
是 3 亿米。

这些数字的简单平均
会不准确地扭曲结果。

但稳健的平均水平
是该群体在很大程度上忽略了

这个荒谬的答案，

通过给予
中间人的投票更多的权重。

回到温哥华的实验，

这正是发生的事情。

小组对异常值的权重要小得多

，相反，共识
结果是个人答案的稳健平均值

。

最值得注意的

是，这
是该群体的一种自发行为。

它发生了，我们没有给他们
任何关于如何达成共识的暗示。

那么，我们该何去何从？

这只是开始，
但我们已经有了一些见解。

良好的集体决策
需要两个组成部分：

深思熟虑和意见的多样性。

目前，
在许多社会中，我们通常发出声音的方式

是通过直接或间接投票。

这有利于意见的多样性，

并且具有

确保每个人都能表达自己的声音的巨大优点。

但这对于[促进]深思熟虑的辩论并不是那么好
。

我们的实验提出了一种不同的方法

，可以有效地同时平衡
这两个目标，即

通过形成小团体
来收敛到一个单一的决定，

同时仍然保持
意见的多样性，

因为有许多独立的团体。

当然，就埃菲尔铁塔的高度达成一致

要比就道德、政治
和意识形态问题达成一致要容易得多。

但
在世界问题更加复杂

、人们更加两极分化的时代，

利用科学帮助我们
了解我们如何互动和做出决定，

有望激发有趣的新方法
来构建更好的民主。