How to read the genome and build a human being Riccardo Sabatini

For the next 16 minutes,
I’m going to take you on a journey

that is probably
the biggest dream of humanity:

to understand the code of life.

So for me, everything started
many, many years ago

when I met the first 3D printer.

The concept was fascinating.

A 3D printer needs three elements:

a bit of information, some
raw material, some energy,

and it can produce any object
that was not there before.

I was doing physics,
I was coming back home

and I realized that I actually
always knew a 3D printer.

And everyone does.

It was my mom.

(Laughter)

My mom takes three elements:

a bit of information, which is between
my father and my mom in this case,

raw elements and energy
in the same media, that is food,

and after several months, produces me.

And I was not existent before.

So apart from the shock of my mom
discovering that she was a 3D printer,

I immediately got mesmerized
by that piece,

the first one, the information.

What amount of information does it take

to build and assemble a human?

Is it much? Is it little?

How many thumb drives can you fill?

Well, I was studying physics
at the beginning

and I took this approximation of a human
as a gigantic Lego piece.

So, imagine that the building
blocks are little atoms

and there is a hydrogen here,
a carbon here, a nitrogen here.

So in the first approximation,

if I can list the number of atoms
that compose a human being,

I can build it.

Now, you can run some numbers

and that happens to be
quite an astonishing number.

So the number of atoms,

the file that I will save in my thumb
drive to assemble a little baby,

will actually fill an entire Titanic
of thumb drives –

multiplied 2,000 times.

This is the miracle of life.

Every time you see from now on
a pregnant lady,

she’s assembling the biggest
amount of information

that you will ever encounter.

Forget big data, forget
anything you heard of.

This is the biggest amount
of information that exists.

(Applause)

But nature, fortunately, is much smarter
than a young physicist,

and in four billion years, managed
to pack this information

in a small crystal we call DNA.

We met it for the first time in 1950
when Rosalind Franklin,

an amazing scientist, a woman,

took a picture of it.

But it took us more than 40 years
to finally poke inside a human cell,

take out this crystal,

unroll it, and read it for the first time.

The code comes out to be
a fairly simple alphabet,

four letters: A, T, C and G.

And to build a human,
you need three billion of them.

Three billion.

How many are three billion?

It doesn’t really make
any sense as a number, right?

So I was thinking how
I could explain myself better

about how big and enormous this code is.

But there is – I mean,
I’m going to have some help,

and the best person to help me
introduce the code

is actually the first man
to sequence it, Dr. Craig Venter.

So welcome onstage, Dr. Craig Venter.

(Applause)

Not the man in the flesh,

but for the first time in history,

this is the genome of a specific human,

printed page-by-page, letter-by-letter:

262,000 pages of information,

450 kilograms, shipped
from the United States to Canada

thanks to Bruno Bowden,
Lulu.com, a start-up, did everything.

It was an amazing feat.

But this is the visual perception
of what is the code of life.

And now, for the first time,
I can do something fun.

I can actually poke inside it and read.

So let me take an interesting
book … like this one.

I have an annotation;
it’s a fairly big book.

So just to let you see
what is the code of life.

Thousands and thousands and thousands

and millions of letters.

And they apparently make sense.

Let’s get to a specific part.

Let me read it to you:

(Laughter)

“AAG, AAT, ATA.”

To you it sounds like mute letters,

but this sequence gives
the color of the eyes to Craig.

I’ll show you another part of the book.

This is actually a little
more complicated.

Chromosome 14, book 132:

(Laughter)

As you might expect.

(Laughter)

“ATT, CTT, GATT.”

This human is lucky,

because if you miss just
two letters in this position –

two letters of our three billion –

he will be condemned
to a terrible disease:

cystic fibrosis.

We have no cure for it,
we don’t know how to solve it,

and it’s just two letters
of difference from what we are.

A wonderful book, a mighty book,

a mighty book that helped me understand

and show you something quite remarkable.

Every one of you – what makes
me, me and you, you –

is just about five million of these,

half a book.

For the rest,

we are all absolutely identical.

Five hundred pages
is the miracle of life that you are.

The rest, we all share it.

So think about that again
when we think that we are different.

This is the amount that we share.

So now that I have your attention,

the next question is:

How do I read it?

How do I make sense out of it?

Well, for however good you can be
at assembling Swedish furniture,

this instruction manual
is nothing you can crack in your life.

(Laughter)

And so, in 2014, two famous TEDsters,

Peter Diamandis and Craig Venter himself,

decided to assemble a new company.

Human Longevity was born,

with one mission:

trying everything we can try

and learning everything
we can learn from these books,

with one target –

making real the dream
of personalized medicine,

understanding what things
should be done to have better health

and what are the secrets in these books.

An amazing team, 40 data scientists
and many, many more people,

a pleasure to work with.

The concept is actually very simple.

We’re going to use a technology
called machine learning.

On one side, we have genomes –
thousands of them.

On the other side, we collected
the biggest database of human beings:

phenotypes, 3D scan, NMR –
everything you can think of.

Inside there, on these two opposite sides,

there is the secret of translation.

And in the middle, we build a machine.

We build a machine
and we train a machine –

well, not exactly one machine,
many, many machines –

to try to understand and translate
the genome in a phenotype.

What are those letters,
and what do they do?

It’s an approach that can
be used for everything,

but using it in genomics
is particularly complicated.

Little by little we grew and we wanted
to build different challenges.

We started from the beginning,
from common traits.

Common traits are comfortable
because they are common,

everyone has them.

So we started to ask our questions:

Can we predict height?

Can we read the books
and predict your height?

Well, we actually can,

with five centimeters of precision.

BMI is fairly connected to your lifestyle,

but we still can, we get in the ballpark,
eight kilograms of precision.

Can we predict eye color?

Yeah, we can.

Eighty percent accuracy.

Can we predict skin color?

Yeah we can, 80 percent accuracy.

Can we predict age?

We can, because apparently,
the code changes during your life.

It gets shorter, you lose pieces,
it gets insertions.

We read the signals, and we make a model.

Now, an interesting challenge:

Can we predict a human face?

It’s a little complicated,

because a human face is scattered
among millions of these letters.

And a human face is not
a very well-defined object.

So, we had to build an entire tier of it

to learn and teach
a machine what a face is,

and embed and compress it.

And if you’re comfortable
with machine learning,

you understand what the challenge is here.

Now, after 15 years – 15 years after
we read the first sequence –

this October, we started
to see some signals.

And it was a very emotional moment.

What you see here is a subject
coming in our lab.

This is a face for us.

So we take the real face of a subject,
we reduce the complexity,

because not everything is in your face –

lots of features and defects
and asymmetries come from your life.

We symmetrize the face,
and we run our algorithm.

The results that I show you right now,

this is the prediction we have
from the blood.

(Applause)

Wait a second.

In these seconds, your eyes are watching,
left and right, left and right,

and your brain wants
those pictures to be identical.

So I ask you to do
another exercise, to be honest.

Please search for the differences,

which are many.

The biggest amount of signal
comes from gender,

then there is age, BMI,
the ethnicity component of a human.

And scaling up over that signal
is much more complicated.

But what you see here,
even in the differences,

lets you understand
that we are in the right ballpark,

that we are getting closer.

And it’s already giving you some emotions.

This is another subject
that comes in place,

and this is a prediction.

A little smaller face, we didn’t get
the complete cranial structure,

but still, it’s in the ballpark.

This is a subject that comes in our lab,

and this is the prediction.

So these people have never been seen
in the training of the machine.

These are the so-called “held-out” set.

But these are people that you will
probably never believe.

We’re publishing everything
in a scientific publication,

you can read it.

But since we are onstage,
Chris challenged me.

I probably exposed myself
and tried to predict

someone that you might recognize.

So, in this vial of blood –
and believe me, you have no idea

what we had to do to have
this blood now, here –

in this vial of blood is the amount
of biological information

that we need to do a full genome sequence.

We just need this amount.

We ran this sequence,
and I’m going to do it with you.

And we start to layer up
all the understanding we have.

In the vial of blood,
we predicted he’s a male.

And the subject is a male.

We predict that he’s a meter and 76 cm.

The subject is a meter and 77 cm.

So, we predicted that he’s 76;
the subject is 82.

We predict his age, 38.

The subject is 35.

We predict his eye color.

Too dark.

We predict his skin color.

We are almost there.

That’s his face.

Now, the reveal moment:

the subject is this person.

(Laughter)

And I did it intentionally.

I am a very particular
and peculiar ethnicity.

Southern European, Italians –
they never fit in models.

And it’s particular – that ethnicity
is a complex corner case for our model.

But there is another point.

So, one of the things that we use
a lot to recognize people

will never be written in the genome.

It’s our free will, it’s how I look.

Not my haircut in this case,
but my beard cut.

So I’m going to show you, I’m going to,
in this case, transfer it –

and this is nothing more
than Photoshop, no modeling –

the beard on the subject.

And immediately, we get
much, much better in the feeling.

So, why do we do this?

We certainly don’t do it
for predicting height

or taking a beautiful picture
out of your blood.

We do it because the same technology
and the same approach,

the machine learning of this code,

is helping us to understand how we work,

how your body works,

how your body ages,

how disease generates in your body,

how your cancer grows and develops,

how drugs work

and if they work on your body.

This is a huge challenge.

This is a challenge that we share

with thousands of other
researchers around the world.

It’s called personalized medicine.

It’s the ability to move
from a statistical approach

where you’re a dot in the ocean,

to a personalized approach,

where we read all these books

and we get an understanding
of exactly how you are.

But it is a particularly
complicated challenge,

because of all these books, as of today,

we just know probably two percent:

four books of more than 175.

And this is not the topic of my talk,

because we will learn more.

There are the best minds
in the world on this topic.

The prediction will get better,

the model will get more precise.

And the more we learn,

the more we will
be confronted with decisions

that we never had to face before

about life,

about death,

about parenting.

So, we are touching the very
inner detail on how life works.

And it’s a revolution
that cannot be confined

in the domain of science or technology.

This must be a global conversation.

We must start to think of the future
we’re building as a humanity.

We need to interact with creatives,
with artists, with philosophers,

with politicians.

Everyone is involved,

because it’s the future of our species.

Without fear, but with the understanding

that the decisions
that we make in the next year

will change the course of history forever.

Thank you.

(Applause)

在接下来的 16 分钟里,
我将带你踏上一段

可能是人类最大梦想的旅程

:了解生命的密码。

所以对我来说,一切都始于
很多很多年前,

当我遇到第一台 3D 打印机时。

这个概念很吸引人。

一台 3D 打印机需要三个要素

:一点信息、一些
原材料、一些能量

,它可以生产出
任何以前不存在的物体。

我正在做物理,
我回到家

,我意识到我实际上
一直都知道 3D 打印机。

每个人都这样做。

是我妈妈。

(笑声)

我妈妈拿了三个元素

:一点信息,
在这个例子中是我父亲和我妈妈之间的信息,

原始元素和能量
在同一个媒体,也就是食物

,几个月后,产生了我。

而我之前并不存在。

所以除了我妈妈
发现她是一台 3D 打印机的震惊之外,

我立刻
被那件作品迷住了

,第一个,信息。

建造和组装一个人需要多少信息?

很多吗? 是不是很小?

你能装多少个拇指驱动器?

好吧,我一开始是在学习物理学

,我把这个人类的近似值
当作一个巨大的乐高积木。

所以,想象一下构建
块是小原子

,这里有一个氢,这里
有一个碳,这里有一个氮。

因此,在第一个近似值中,

如果我可以列出
构成人类的原子数量,

我就可以构建它。

现在,您可以运行一些数字,

而这恰好是
一个相当惊人的数字。

所以原子的数量

,我将保存在我的拇指
驱动器中以组装一个小婴儿的文件,

实际上会填满整个
拇指驱动器的泰坦尼克号——

乘以 2,000 倍。

这是生命的奇迹。

从现在开始,每次您看到
一位怀孕的女士时,

都会收集您将遇到的最多的信息。

忘记大数据,忘记
你听说过的任何事情。

这是存在的最大
信息量。

(掌声)

但幸运的是,大自然
比年轻的物理学家聪明得多,

并且在四十亿年的时间里,设法
将这些信息包装

在我们称为 DNA 的小晶体中。

我们第一次见到它是在 1950 年
,当时一位了不起的科学家、女性罗莎琳德·富兰克林(Rosalind Franklin)

为它拍照。

但我们花了 40 多年的时间
,终于在一个人类细胞里戳了进去,

取出了这颗水晶,

展开它,第一次阅读它。

代码是
一个相当简单的字母表,

四个字母:A、T、C 和

G。要建造一个人类,
你需要 30 亿个字母。

三十亿。

三十亿是多少?

作为一个数字,它真的没有任何意义,对吧?

所以我在想
如何才能更好

地解释这段代码有多大。

但是有——我的意思是,
我会得到一些帮助

,帮助我
介绍代码

的最佳人选实际上是第一个
对其进行排序的人,Craig Venter 博士。

所以欢迎上台,克雷格文特尔博士。

(掌声)

不是肉身的人,

而是历史上第一次,

这是一个特定人类的基因组

,逐页逐字打印:

262,000页信息,

450公斤,
从 美国到加拿大

多亏了Bruno Bowden
,一家初创公司Lulu.com,无所不用其极。

这是一项了不起的壮举。

但这
是对生命密码的视觉感知。

而现在,我第一次
可以做一些有趣的事情。

我实际上可以戳进去阅读。

所以让我拿一本有趣的
书……就像这本书。

我有一个注释;
这是一本相当大的书。

所以只是为了让你看看
什么是生命密码。

成千上万的信件。

他们显然是有道理的。

让我们进入一个特定的部分。

让我念给你们听

:(笑声)

“AAG,AAT,ATA。”

对你来说,这听起来像是无声的字母,

但这个序列赋予
了克雷格眼睛的颜色。

我将向您展示本书的另一部分。

这实际上有点
复杂。

第 14 号染色体,第 132 册

:(笑声)

正如你所料。

(笑声)

“ATT,CTT,GATT。”

这个人很幸运,

因为如果你
在这个位置上只漏掉两个字母——

我们 30 亿个字母中的两个——

他将被
判处一种可怕的疾病:

囊性纤维化。

我们无法治愈它,
我们不知道如何解决它

,这只是与我们的两个
字母不同。

一本很棒的书,一本伟大的书,

一本伟大的书,它帮助我理解

并向你展示了一些非常了不起的东西。

你们每个人——是什么造就了
我,我和你,你

——只有大约五百万,

半本书。

其余的,

我们都是完全一样的。

五百页
是你的生命奇迹。

其余的,我们都分享。

因此,
当我们认为我们不同时,请再次考虑这一点。

这是我们分享的金额。

所以现在我引起了你的注意

,下一个问题是:

我如何阅读它?

我如何理解它?

好吧,不管你
在组装瑞典家具方面做得多么好,

这本说明书
是你一生中无法破解的。

(笑声

) 因此,在 2014 年,两位著名的 TED 专家,

彼得·迪亚曼迪斯和克雷格·文特尔本人,

决定组建一家新公司。

人类长寿诞生了,

带着一个使命:

尽我们所能尝试


学习我们可以从这些书中学到的一切,

有一个目标——

实现
个性化医疗的梦想,

了解
应该做些什么来获得更好的健康,

以及什么是 这些书中的秘密。

一个了不起的团队,40 名数据科学家
和更多的人,

很高兴与他们一起工作。

这个概念实际上非常简单。

我们将使用一种
称为机器学习的技术。

一方面,我们有基因组——
成千上万个。

另一方面,我们收集
了最大的人类数据库:

表型、3D 扫描、核磁共振——
你能想到的一切。

在里面,在这两个对立面,

有翻译的秘密。

在中间,我们制造了一台机器。

我们制造一台机器
,我们训练一台机器——

嗯,不完全是一台机器,
很多很多台机器——

试图理解和
翻译基因组的表型。

这些字母
是什么,它们有什么作用?

这是一种
可以用于任何事情的方法,

但在基因组学中使用
它特别复杂。

我们一点一点地成长,我们
想要建立不同的挑战。

我们从一开始,
从共同的特征开始。

共同特征很舒服,
因为它们很常见,

每个人都有。

所以我们开始问我们的问题:

我们能预测身高吗?

我们可以阅读书籍
并预测您的身高吗?

嗯,我们实际上可以,

精度为 5 厘米。

BMI 与您的生活方式息息相关,

但我们仍然可以,我们大致了解一下
8 公斤的精度。

我们可以预测眼睛的颜色吗?

是的,我们可以。

百分之八十的准确率。

我们可以预测肤色吗?

是的,我们可以,80% 的准确率。

我们可以预测年龄吗?

我们可以,因为显然
,代码在你的一生中会发生变化。

它变短了,你丢失了碎片,
它得到了插入。

我们读取信号,然后制作模型。

现在,一个有趣的挑战:

我们能预测一张人脸吗?

这有点复杂,

因为一张人脸分散
在数以百万计的这些字母中。

而且人脸不是
一个定义非常明确的对象。

因此,我们必须构建它的整个层

来学习和
教机器什么是人脸,

然后嵌入和压缩它。

如果你对
机器学习感到满意,

你就会明白这里的挑战是什么。

现在,在 15 年后——在
我们阅读第一个序列 15 年后——

今年 10 月,我们
开始看到一些信号。

这是一个非常情绪化的时刻。

你在这里看到的是
我们实验室的一个主题。

这是我们的一张脸。

所以我们拍摄一个主题的真实面貌,
我们降低了复杂性,

因为并非一切都在你的脸上——

许多特征、缺陷
和不对称来自你的生活。

我们使人脸对称,
然后运行我们的算法。

我现在向你展示的结果,

这是我们
从血液中得到的预测。

(掌声)

等一下。

在这几秒钟里,你的眼睛在看,
左右,左右

,你的大脑希望
这些图片是相同的。

所以我请你做
另一个练习,说实话。

请搜索差异,

其中有很多。

最大量的信号
来自性别,

然后是年龄、BMI,
以及人类的种族成分。

在这个信号上放大
要复杂得多。

但是你在这里看到的,
即使是在差异中,也

让你
明白我们在正确的球场上

,我们越来越近了。

它已经给了你一些情绪。

这是另一个
到位的主题

,这是一个预测。

小一点的脸,我们没有
得到完整的颅骨结构

,但它仍然在球场上。

这是我们实验室的一个主题

,这就是预测。

所以这些人
在机器的训练中从来没有出现过。

这些就是所谓的“保留”集。

但这些人你
可能永远不会相信。

我们
在科学出版物中发布所有内容,

您可以阅读它。

但既然我们在舞台上,
克里斯就向我提出了挑战。

我可能暴露了自己
并试图

预测你可能认识的人。

所以,在这瓶血中
——相信我,你不

知道我们现在必须做什么才能得到
这种血,在这里——

在这瓶血中

,我们需要做一个完整的基因组的生物信息量 顺序。

我们只需要这个数量。

我们运行了这个序列
,我会和你一起做。

我们开始将
我们拥有的所有理解分层。

在血瓶中,
我们预测他是男性。

对象是男性。

我们预测他一米七十六厘米。

主题是一米和 77 厘米。

所以,我们预测他是 76 岁;
对象是 82。

我们预测他的年龄,

38。对象是 35。

我们预测他的眼睛颜色。

太暗了。

我们预测他的肤色。

我们就快到了。

那是他的脸。

现在,揭示时刻

:主题是这个人。

(笑声)

我是故意这样做的。

我是一个非常特殊
和特殊的种族。

南欧人、意大利人——
他们从来不适合模特。

特别是——种族
是我们模型的一个复杂的极端案例。

但还有一点。

所以,我们
经常用来识别人的东西之一

永远不会被写入基因组。

这是我们的自由意志,这就是我的样子。

在这种情况下不是我的发型,
而是我的胡须。

所以我要向你们展示,
在这种情况下,我要转移它

——这只不过是
Photoshop,没有建模——

这个主题的胡须。

马上,我们
的感觉就好多了。

那么,我们为什么要这样做呢?

我们当然不是
为了预测身高


从你的血液中拍出美丽的照片。

我们这样做是因为相同的技术
和相同的方法,

即代码的机器学习,

正在帮助我们了解我们的工作方式、

您的身体如何运作、您的

身体如何老化、

疾病如何在您的身体中产生、

您的癌症如何生长和 发展,

药物如何

起作用以及它们是否对您的身体起作用。

这是一个巨大的挑战。

这是我们

与世界各地数以千计的其他
研究人员共同面临的挑战。

它被称为个性化医疗。

是一种从你是海洋中的一个点的统计方法

转变为个性化方法的能力,

在这种方法中,我们阅读了所有这些书籍

,我们可以
准确地了解你的情况。

但这是一个特别
复杂的挑战,

因为所有这些书,到今天为止,

我们只知道大概百分之二:

四本书,超过 175 本书。

这不是我演讲的主题,

因为我们会学到更多。

在这个话题上有世界上最优秀的头脑。

预测会变得更好

,模型会变得更精确。

我们学

得越多,就会越多

面临我们以前从未面对过的

关于生

、死

、育儿的决定。

所以,我们正在触及
生活如何运作的内在细节。

这是一场
不能

局限于科学或技术领域的革命。

这必须是一场全球对话。

我们必须开始思考
我们正在建设的人类未来。

我们需要与创意人士
、艺术家、哲学家

和政治家互动。

每个人都参与其中,

因为这是我们物种的未来。

不要害怕,但要

明白我们在明年做出的决定

将永远改变历史进程。

谢谢你。

(掌声)