Inside OKCupid The math of online dating Christian Rudder

Translator: Andrea McDonough
Reviewer: Bedirhan Cinar

Hello, my name is Christian Rudder,

and I was one of the founders of OkCupid.

It’s now one of the biggest
dating sites in the United States.

Like most everyone at the site,
I was a math major,

As you may expect, we’re known
for the analytic approach we take to love.

We call it our matching algorithm.

Basically, OkCupid’s matching
algorithm helps us decide

whether two people should go on a date.

We built our entire business around it.

Now, algorithm is a fancy word,

and people like to drop it
like it’s this big thing.

But really, an algorithm
is just a systematic,

step-by-step way to solve a problem.

It doesn’t have to be fancy at all.

Here in this lesson,

I’m going to explain how we arrived
at our particular algorithm,

so you can see how it’s done.

Now, why are algorithms even important?

Why does this lesson even exist?

Well, notice one very significant
phrase I used above:

they are a step-by-step
way to solve a problem,

and as you probably know, computers
excel at step-by-step processes.

A computer without an algorithm

is basically an expensive paperweight.

And since computers are such
a pervasive part of everyday life,

algorithms are everywhere.

The math behind OkCupid’s matching
algorithm is surprisingly simple.

It’s just some addition, multiplication,
a little bit of square roots.

The tricky part in designing it

was figuring out how to take
something mysterious,

human attraction,

and break it into components
that a computer can work with.

The first thing we needed
to match people up was data,

something for the algorithm to work with.

The best way to get data quickly
from people is to just ask for it.

So we decided that OkCupid
should ask users questions,

stuff like, “Do you want
to have kids one day?”

“How often do you brush your teeth?”

“Do you like scary movies?”

And big stuff like,
“Do you believe in God?”

Now, a lot of the questions
are good for matching like with like,

that is, when both people
answer the same way.

For example, two people
who are both into scary movies

are probably a better match
than one person who is and one who isn’t.

But what about a question like,

“Do you like to be
the center of attention?”

If both people in a relationship
are saying yes to this,

they’re going to have massive problems.

We realized this early on,

and so we decided we needed
a bit more data from each question.

We had to ask people to specify
not only their own answer,

but the answer they wanted
from someone else.

That worked really well.

But we needed one more dimension.

Some questions tell you more
about a person than others.

For example, a question
about politics, something like,

“Which is worse:
book burning or flag burning?”

might reveal more about someone
than their taste in movies.

And it doesn’t make sense
to weigh all things equally,

so we added one final data point.

For everything that OkCupid asks you,

you have a chance to tell us
the role it plays in your life.

And this ranges
from irrelevant to mandatory.

So now, for every question,
we have three things for our algorithm:

first, your answer;

second, how you want someone else –
your potential match – to answer;

and third, how important
the question is to you at all.

With all this information,

OkCupid can figure out
how well two people will get along.

The algorithm crunches the numbers
and gives us a result.

As a practical example,

let’s look at how we’d match you
with another person.

Let’s call him “B.”

Your match percentage with B is based
on questions you’ve both answered.

Let’s call that set
of common questions “s.”

As a very simple example,
we use a small set “s”

with just two questions in common,

and compute a match from that.

Here are our two example questions.

The first one, let’s say, is,
“How messy are you?”

And the answer possibilities are:

very messy, average and very organized.

And let’s say you answered
“very organized,”

and you’d like someone else
to answer “very organized,”

and the question is very important to you.

Basically, you’re a neat freak.

You’re neat, you want someone else
to be neat, and that’s it.

And let’s say B is a little bit different.

He answered “very organized” for himself,

but “average” is OK with him
as an answer from someone else,

and the question is only
a little important to him.

Let’s look at the second question,
from our previous example:

“Do you like to be
the center of attention?”

The answers are “yes” and “no.”

You’ve answered “no,” you want
someone else to answer “no,”

and the question is only
a little important to you.

Now B, he’s answered “yes.”

He wants someone else to answer “no,”

because he wants the spotlight on him,

and the question is somewhat
important to him.

So, let’s try to compute all of this.

Our first step is, since we use
computers to do this,

we need to assign numerical values

to ideas like “somewhat
important” and “very important,”

because computers need
everything in numbers.

We at OkCupid decided
on the following scale:

“Irrelevant” is worth 0.

“A little important” is worth 1.

“Somewhat important” is worth 10.

“Very important” is 50.

And “absolutely mandatory” is 250.

Next, the algorithm makes
two simple calculations.

The first is: How much did
B’s answers satisfy you?

That is, how many possible points
did B score on your scale?

Well, you indicated that B’s answer
to the first question,

about messiness,

was very important to you.

It’s worth 50 points and B got that right.

The second question is worth only 1,

because you said
it was only a little important.

B got that wrong,

so B’s answers were 50
out of 51 possible points.

That’s 98% satisfactory. Pretty good.

The second question the algorithm
looks at is: How much did you satisfy B?

Well, B placed 1 point on your answer
to the messiness question

and 10 on your answer to the second.

Of those 11, that’s 1 plus 10,
you earned 10 –

you guys satisfied each other
on the second question.

So your answers were 10 out of 11
equals 91 percent satisfactory to B.

That’s not bad.

The final step is to take
these two match percentages

and get one number for the both of you.

To do this, the algorithm
multiplies your scores,

then takes the nth root,

where “n” is the number of questions.

Because s, which is the number
of questions in this sample,

is only 2,

we have: match percentage
equals the square root

of 98 percent times 91 percent.

That equals 94 percent.

That 94 percent is your match
percentage with B.

It’s a mathematical expression
of how happy you’d be with each other,

based on what we know.

Now, why does the algorithm multiply,

as opposed to, say, average
the two match scores together,

and do the square-root business?

In general, this formula
is called the geometric mean.

It’s a great way to combine
values that have wide ranges

and represent very different properties.

In other words, it’s perfect
for romantic matching.

You’ve got wide ranges and you’ve got
tons of different data points,

like I said, about movies, politics,
religion – everything.

Intuitively, too, this makes sense.

Two people satisfying
each other 50 percent

should be a better match
than two others who satisfy 0 and 100,

because affection needs to be mutual.

After adding a little correction
for margin of error,

in the case where we have
a small number of questions,

like we do in this example,

we’re good to go.

Any time OkCupid matches two people,

it goes through the steps
we just outlined.

First it collects data about your answers,

then it compares your choices
and preferences to other people’s

in simple, mathematical ways.

This, the ability to take
real-world phenomena

and make them something
a microchip can understand,

is, I think, the most important skill
anyone can have these days.

Like you use sentences
to tell a story to a person,

you use algorithms
to tell a story to a computer.

If you learn the language,
you can go out and tell your stories.

I hope this will help you do that.

译者：Andrea McDonough
审稿人：Bedirhan Cinar

大家好，我叫 Christian Rudder，

是 OkCupid 的创始人之一。

它现在是美国最大的
约会网站之一。

像网站上的大多数人一样，
我是数学专业的，

正如你所料，
我们以我们喜欢的分析方法而闻名。

我们称之为匹配算法。

基本上，OkCupid 的匹配
算法帮助我们

决定两个人是否应该约会。

我们围绕它建立了整个业务。

现在，算法是一个花哨的词

，人们喜欢放弃它，
就像它是一件大事一样。

但实际上，算法
只是解决问题的系统的、

逐步的方法。

它根本不需要花哨。

在本课中，

我将解释
我们是如何得出我们的特定算法的，

以便您了解它是如何完成的。

现在，为什么算法甚至很重要？

为什么会有这个教训？

好吧，请注意
我在上面使用的一个非常重要的短语：

它们是
解决问题的逐步方法，

而且您可能知道，计算机
擅长逐步处理。

没有算法的计算机

基本上是一个昂贵的镇纸。

由于计算机
是日常生活中如此普遍的一部分，

算法无处不在。

OkCupid 匹配算法背后的数学运算
非常简单。

这只是一些加法，乘法，
一点点平方根。

设计它的棘手部分

是弄清楚如何将
一些神秘的、具有

人类吸引力的

东西分解
成计算机可以使用的组件。

我们需要匹配人的第一件事
是数据，

算法可以使用的东西。

快速从人们那里获取数据的最佳方式
就是直接询问。

所以我们决定 OkCupid
应该问用户一些问题

，比如“你
想有一天要孩子吗？”

“你多久刷一次牙？”

“你喜欢恐怖片吗？”

还有诸如
“你相信上帝吗？”之类的大问题。

现在，很多问题
都适合用like来匹配，

也就是说，当两个人
回答相同的时候。

例如，两个
都喜欢恐怖电影的

人可能
比一个喜欢的人和一个不喜欢的人更适合。

但是像

“你喜欢
成为关注的焦点吗？”这样的问题呢？

如果恋爱
中的两个人都对此表示同意，

那么他们将遇到很大的问题。

我们很早就意识到了这一点

，因此我们决定
需要从每个问题中获取更多数据。

我们必须要求人们不仅要指定
他们自己的答案，

还要指定他们想要从其他人那里得到的答案
。

这真的很好。

但我们需要更多的维度。

有些问题
比其他问题更能说明一个人。

例如，
关于政治的问题，例如

“哪个更糟：
烧书还是烧国旗？”

可能
比他们对电影的品味更能揭示一个人。

平等地权衡所有事物是没有意义的，

所以我们添加了一个最终数据点。

对于 OkCupid 向您提出的所有要求，

您都有机会告诉我们
它在您的生活中扮演的角色。

这范围
从不相关到强制性。

所以现在，对于每个问题，
我们的算法都有三件事：

第一，你的答案；

第二，你希望别人——
你的潜在对手——如何回答；

第三，
这个问题对你来说有多重要。

有了所有这些信息，

OkCupid 就可以计算出
两个人相处得如何。

该算法处理数字
并给我们一个结果。

作为一个实际示例，

让我们看看我们如何将您
与另一个人匹配。

我们称他为“B”。

您与 B 的匹配百分比
基于您俩都回答过的问题。

我们称这
组常见问题为“s”。

作为一个非常简单的示例，
我们使用只有两个共同问题的小集合“s”

，

并从中计算匹配。

这是我们的两个示例问题。

第一个，比如说，
“你有多乱？”

答案可能是：

非常混乱、平均且非常有条理。

假设您回答
“非常有条理”，

并且您希望其他
人回答“非常有条理”，

而这个问题对您来说非常重要。

基本上，你是一个整洁的怪胎。

你很整洁，你希望
别人也很整洁，就是这样。

假设 B 有点不同。

他为自己回答“非常有条理”，

但“平均”
作为别人的回答对他来说是可以的

，这个问题
对他来说只是一点点重要。

让我们看第二个问题，
来自我们之前的例子：

“你喜欢
成为关注的焦点吗？”

答案是“是”和“否”。

你已经回答了“不”，你希望
别人回答“不”，

而这个问题
对你来说只是一点点重要。

现在B，他的回答是“是的”。

他希望别人回答“不”，

因为他希望聚光灯在他身上，

而这个问题
对他来说有点重要。

所以，让我们尝试计算所有这些。

我们的第一步是，由于我们使用
计算机来执行此操作，因此

我们需要为

诸如“有些
重要”和“非常重要”之类的想法分配数值，

因为计算机需要
数字中的所有内容。

我们 OkCupid
决定了以下等级：

“不相关”为 0。

“有点重要”为 1。

“有点重要”为 10。

“非常重要”为

50。“绝对强制性”为 250。

接下来，该算法进行了
两个简单的计算。

首先是：
B 的回答让您满意多少？

也就是说，
B 在你的量表上得了多少分？

嗯，你表示 B
对第一个问题的回答，

关于混乱，

对你来说非常重要。

它值 50 分，B 做对了。

第二个问题只值1，

因为你
说它只是有点重要。

B 弄错了，

所以 B 的答案是
51 分中的 50 分。

这是 98% 的满意。非常好。

算法关注的第二个问题
是：你在多大程度上满足了 B？

嗯，B 在你
对混乱问题的回答上打了 1 分

，在你对第二个问题的回答上打了 10 分。

在这 11 个中，即 1 加 10，
你获得了

10——你们
在第二个问题上彼此满意。

所以你的答案是 11 分中有 10 分
等于 91% 对 B 满意。

这还不错。

最后一步是获取
这两个匹配百分比

并为你们俩获得一个数字。

为此，该算法
将您的分数相乘，

然后取第 n 个根，

其中“n”是问题的数量。

因为 s
是这个样本中的问题数

，只有 2，所以

我们有：匹配百分比

等于 98% 的平方根乘以 91%。

这等于 94%。

那 94% 是你
与 B 的匹配百分比。

这是根据我们所知道的
，你对彼此的幸福程度的数学表达

。

现在，为什么算法要相乘

，而不是
把两个匹配分数平均在一起，

然后做平方根业务？

通常，这个
公式称为几何平均数。

这是组合
具有广泛范围

并表示非常不同属性的值的好方法。

换句话说，它非常
适合浪漫搭配。

你有很宽的范围，你有
很多不同的数据点，

就像我说的，关于电影、政治、
宗教——一切。

直觉上，这也是有道理的。

两个人
彼此满足 50%

应该
比其他两个满足 0 和 100 的人更好，

因为感情需要是相互的。

在对误差范围进行一些修正后

，如果我们有
少量问题，

就像我们在这个例子中所做的那样，

我们就可以开始了。

每当 OkCupid 匹配两个人时，

它都会执行
我们刚刚概述的步骤。

首先，它收集有关您的答案的数据，

然后以简单的数学方式将您的选择
和偏好与其他人的选择和偏好进行比较

。

我认为，将
现实世界的现象转化

为微芯片可以理解的能力，

是当今
任何人都可以拥有的最重要的技能。

就像你用句子
给一个人讲故事一样，

你用算法
给计算机讲故事。

如果你学会了语言，
你就可以出去讲述你的故事。

我希望这会帮助你做到这一点。