How data is helping us unravel the mysteries of the brain Steve McCarroll

Nine years ago,

my sister discovered lumps
in her neck and arm

and was diagnosed with cancer.

From that day, she started to benefit

from the understanding
that science has of cancer.

Every time she went to the doctor,

they measured specific molecules

that gave them information
about how she was doing

and what to do next.

New medical options
became available every few years.

Everyone recognized
that she was struggling heroically

with a biological illness.

This spring, she received
an innovative new medical treatment

in a clinical trial.

It dramatically knocked back her cancer.

Guess who I’m going to spend
this Thanksgiving with?

My vivacious sister,

who gets more exercise than I do,

and who, like perhaps
many people in this room,

increasingly talks about a lethal illness

in the past tense.

Science can, in our lifetimes –
even in a decade –

transform what it means
to have a specific illness.

But not for all illnesses.

My friend Robert and I
were classmates in graduate school.

Robert was smart,

but with each passing month,

his thinking seemed to become
more disorganized.

He dropped out of school,
got a job in a store …

But that, too, became too complicated.

Robert became fearful and withdrawn.

A year and a half later,
he started hearing voices

and believing that people
were following him.

Doctors diagnosed him with schizophrenia,

and they gave him
the best drug they could.

That drug makes the voices
somewhat quieter,

but it didn’t restore his bright mind
or his social connectedness.

Robert struggled to remain connected

to the worlds of school
and work and friends.

He drifted away,

and today I don’t know where to find him.

If he watches this,

I hope he’ll find me.

Why does medicine have
so much to offer my sister,

and so much less to offer
millions of people like Robert?

The need is there.

The World Health Organization
estimates that brain illnesses

like schizophrenia, bipolar disorder
and major depression

are the world’s largest cause
of lost years of life and work.

That’s in part because these illnesses
often strike early in life,

in many ways, in the prime of life,

just as people are finishing
their educations, starting careers,

forming relationships and families.

These illnesses can result in suicide;

they often compromise one’s ability
to work at one’s full potential;

and they’re the cause of so many
tragedies harder to measure:

lost relationships and connections,

missed opportunities
to pursue dreams and ideas.

These illnesses limit human possibilities

in ways we simply cannot measure.

We live in an era in which
there’s profound medical progress

on so many other fronts.

My sister’s cancer story
is a great example,

and we could say the same
of heart disease.

Drugs like statins will prevent
millions of heart attacks and strokes.

When you look at these areas
of profound medical progress

in our lifetimes,

they have a narrative in common:

scientists discovered molecules
that matter to an illness,

they developed ways to detect
and measure those molecules in the body,

and they developed ways
to interfere with those molecules

using other molecules – medicines.

It’s a strategy that has worked
again and again and again.

But when it comes to the brain,
that strategy has been limited,

because today, we don’t know
nearly enough, yet,

about how the brain works.

We need to learn which of our cells
matter to each illness,

and which molecules in those cells
matter to each illness.

And that’s the mission
I want to tell you about today.

My lab develops technologies
with which we try to turn the brain

into a big-data problem.

You see, before I became a biologist,
I worked in computers and math,

and I learned this lesson:

wherever you can collect vast amounts
of the right kinds of data

about the functioning of a system,

you can use computers in powerful new ways

to make sense of that system
and learn how it works.

Today, big-data approaches
are transforming

ever-larger sectors of our economy,

and they could do the same
in biology and medicine, too.

But you have to have
the right kinds of data.

You have to have data
about the right things.

And that often requires
new technologies and ideas.

And that is the mission that animates
the scientists in my lab.

Today, I want to tell you
two short stories from our work.

One fundamental obstacle we face

in trying to turn the brain
into a big-data problem

is that our brains are composed of
and built from billions of cells.

And our cells are not generalists;
they’re specialists.

Like humans at work,

they specialize into thousands
of different cellular careers,

or cell types.

In fact, each of
the cell types in our body

could probably give a lively TED Talk

about what it does at work.

But as scientists,
we don’t even know today

how many cell types there are,

and we don’t know what the titles
of most of those talks would be.

Now, we know many
important things about cell types.

They can differ dramatically
in size and shape.

One will respond to a molecule
that the other doesn’t respond to,

they’ll make different molecules.

But science has largely
been reaching these insights

in an ad hoc way, one cell type at a time,

one molecule at a time.

We wanted to make it possible to learn
all of this quickly and systematically.

Now, until recently, it was the case

that if you wanted to inventory
all of the molecules

in a part of the brain or any organ,

you had to first grind it up
into a kind of cellular smoothie.

But that’s a problem.

As soon as you’ve ground up the cells,

you can only study the contents
of the average cell –

not the individual cells.

Imagine if you were trying to understand
how a big city like New York works,

but you could only do so
by reviewing some statistics

about the average resident of New York.

Of course, you wouldn’t learn very much,

because everything that’s interesting
and important and exciting

is in all the diversity
and the specializations.

And the same thing is true of our cells.

And we wanted to make it possible to study
the brain not as a cellular smoothie

but as a cellular fruit salad,

in which one could generate
data about and learn from

each individual piece of fruit.

So we developed
a technology for doing that.

You’re about to see a movie of it.

Here we’re packaging
tens of thousands of individual cells,

each into its own tiny water droplet

for its own molecular analysis.

When a cell lands in a droplet,
it’s greeted by a tiny bead,

and that bead delivers millions
of DNA bar code molecules.

And each bead delivers
a different bar code sequence

to a different cell.

We incorporate the DNA bar codes

into each cell’s RNA molecules.

Those are the molecular
transcripts it’s making

of the specific genes
that it’s using to do its job.

And then we sequence billions
of these combined molecules

and use the sequences to tell us

which cell and which gene

every molecule came from.

We call this approach “Drop-seq,”
because we use droplets

to separate the cells for analysis,

and we use DNA sequences
to tag and inventory

and keep track of everything.

And now, whenever we do an experiment,

we analyze tens of thousands
of individual cells.

And today in this area of science,

the challenge is increasingly
how to learn as much as we can

as quickly as we can

from these vast data sets.

When we were developing Drop-seq,
people used to tell us,

“Oh, this is going to make you guys
the go-to for every major brain project.”

That’s not how we saw it.

Science is best when everyone
is generating lots of exciting data.

So we wrote a 25-page instruction book,

with which any scientist could build
their own Drop-seq system from scratch.

And that instruction book has been
downloaded from our lab website

50,000 times in the past two years.

We wrote software
that any scientist could use

to analyze the data
from Drop-seq experiments,

and that software is also free,

and it’s been downloaded from our website
30,000 times in the past two years.

And hundreds of labs have written us
about discoveries that they’ve made

using this approach.

Today, this technology is being used
to make a human cell atlas.

It will be an atlas of all
of the cell types in the human body

and the specific genes
that each cell type uses to do its job.

Now I want to tell you about
a second challenge that we face

in trying to turn the brain
into a big data problem.

And that challenge is that
we’d like to learn from the brains

of hundreds of thousands of living people.

But our brains are not physically
accessible while we’re living.

But how can we discover molecular factors
if we can’t hold the molecules?

An answer comes from the fact that
the most informative molecules, proteins,

are encoded in our DNA,

which has the recipes our cells follow
to make all of our proteins.

And these recipes vary
from person to person to person

in ways that cause the proteins
to vary from person to person

in their precise sequence

and in how much each cell type
makes of each protein.

It’s all encoded in our DNA,
and it’s all genetics,

but it’s not the genetics
that we learned about in school.

Do you remember big B, little b?

If you inherit big B, you get brown eyes?

It’s simple.

Very few traits are that simple.

Even eye color is shaped by much more
than a single pigment molecule.

And something as complex
as the function of our brains

is shaped by the interaction
of thousands of genes.

And each of these genes
varies meaningfully

from person to person to person,

and each of us is a unique
combination of that variation.

It’s a big data opportunity.

And today, it’s increasingly
possible to make progress

on a scale that was never possible before.

People are contributing to genetic studies

in record numbers,

and scientists around the world
are sharing the data with one another

to speed progress.

I want to tell you a short story
about a discovery we recently made

about the genetics of schizophrenia.

It was made possible
by 50,000 people from 30 countries,

who contributed their DNA
to genetic research on schizophrenia.

It had been known for several years

that the human genome’s largest influence
on risk of schizophrenia

comes from a part of the genome

that encodes many of the molecules
in our immune system.

But it wasn’t clear which gene
was responsible.

A scientist in my lab developed
a new way to analyze DNA with computers,

and he discovered something
very surprising.

He found that a gene called
“complement component 4” –

it’s called “C4” for short –

comes in dozens of different forms
in different people’s genomes,

and these different forms
make different amounts

of C4 protein in our brains.

And he found that the more
C4 protein our genes make,

the greater our risk for schizophrenia.

Now, C4 is still just one risk factor
in a complex system.

This isn’t big B,

but it’s an insight about
a molecule that matters.

Complement proteins like C4
were known for a long time

for their roles in the immune system,

where they act as a kind of
molecular Post-it note

that says, “Eat me.”

And that Post-it note
gets put on lots of debris

and dead cells in our bodies

and invites immune cells
to eliminate them.

But two colleagues of mine found
that the C4 Post-it note

also gets put on synapses in the brain

and prompts their elimination.

Now, the creation and elimination
of synapses is a normal part

of human development and learning.

Our brains create and eliminate
synapses all the time.

But our genetic results suggest
that in schizophrenia,

the elimination process
may go into overdrive.

Scientists at many drug companies tell me
they’re excited about this discovery,

because they’ve been working
on complement proteins for years

in the immune system,

and they’ve learned a lot
about how they work.

They’ve even developed molecules
that interfere with complement proteins,

and they’re starting to test them
in the brain as well as the immune system.

It’s potentially a path toward a drug
that might address a root cause

rather than an individual symptom,

and we hope very much that this work
by many scientists over many years

will be successful.

But C4 is just one example

of the potential for data-driven
scientific approaches

to open new fronts on medical problems
that are centuries old.

There are hundreds of places
in our genomes

that shape risk for brain illnesses,

and any one of them could lead us
to the next molecular insight

about a molecule that matters.

And there are hundreds of cell types that
use these genes in different combinations.

As we and other scientists
work to generate

the rest of the data that’s needed

and to learn all that we can
from that data,

we hope to open many more new fronts.

Genetics and single-cell analysis
are just two ways

of trying to turn the brain
into a big data problem.

There is so much more we can do.

Scientists in my lab
are creating a technology

for quickly mapping the synaptic
connections in the brain

to tell which neurons are talking
to which other neurons

and how that conversation changes
throughout life and during illness.

And we’re developing a way
to test in a single tube

how cells with hundreds
of different people’s genomes

respond differently to the same stimulus.

These projects bring together
people with diverse backgrounds

and training and interests –

biology, computers, chemistry,
math, statistics, engineering.

But the scientific possibilities
rally people with diverse interests

into working intensely together.

What’s the future
that we could hope to create?

Consider cancer.

We’ve moved from an era of ignorance
about what causes cancer,

in which cancer was commonly ascribed
to personal psychological characteristics,

to a modern molecular understanding
of the true biological causes of cancer.

That understanding today
leads to innovative medicine

after innovative medicine,

and although there’s still
so much work to do,

we’re already surrounded by people
who have been cured of cancers

that were considered untreatable
a generation ago.

And millions of cancer survivors
like my sister

find themselves with years of life
that they didn’t take for granted

and new opportunities

for work and joy and human connection.

That is the future that we are determined
to create around mental illness –

one of real understanding and empathy

and limitless possibility.

Thank you.

(Applause)

九年前,


姐姐在脖子和手臂上发现了肿块

,被诊断出患有癌症。

从那天起,她开始

受益于
科学对癌症的了解。

每次她去看医生时,

他们都会测量特定的分子

,这些分子会为他们提供
有关她的表现

以及下一步该做什么的信息。

每隔几年就会出现新的医疗选择。

每个人
都认识到她正英勇地

与一种生物疾病作斗争。

今年春天,她在
临床试验中接受了一种创新的新疗法

它戏剧性地击退了她的癌症。

猜猜我要和谁一起度过
这个感恩节?

我活泼的姐姐,

她比我锻炼得更多,

而且和
这个房间里的许多人一样,她

越来越多

地用过去时态谈论一种致命的疾病。

科学可以在我们的有生之年——
甚至十年之内——

改变患某种特定疾病的意义。

但并非适用于所有疾病。

我的朋友罗伯特和我
是研究生院的同学。

罗伯特很聪明,

但随着时间的推移,

他的思想似乎变得
越来越杂乱无章。

他辍学了,
在商店里找到了一份工作……

但这也变得太复杂了。

罗伯特变得恐惧和孤僻。

一年半后,
他开始听到声音,

并相信人们
在跟踪他。

医生诊断出他患有精神分裂症,

并给了
他最好的药物。

那种药让声音
稍微安静了一些,

但它并没有恢复他的聪明头脑
或他的社会联系。

罗伯特努力与

学校
、工作和朋友的世界保持联系。

他飘走了

,今天我不知道在哪里可以找到他。

如果他看到这个,

我希望他能找到我。

为什么医学
能为我姐姐提供这么多,

而提供给
像罗伯特这样的数百万人的那么少?

需求就在那里。

世界卫生组织
估计,

精神分裂症、双相情感障碍
和重度抑郁症等脑部疾病

是世界
上失去多年生活和工作的最大原因。

这部分是因为这些疾病
通常在生命的早期发生,

在许多方面,在生命的黄金时期,

就像人们完成
学业、开始职业、

建立关系和家庭一样。

这些疾病会导致自杀;

他们经常损害
一个人充分发挥潜力的能力;

它们是造成如此多
难以衡量的悲剧的原因:

失去关系和联系,

错过
追求梦想和想法的机会。

这些疾病

以我们无法衡量的方式限制了人类的可能性。

我们生活在这样一个时代,在这个时代,
医学

在许多其他方面都取得了重大进展。

我姐姐的癌症故事
就是一个很好的例子

,我们可以
说心脏病也是如此。

他汀类药物等药物可以预防
数以百万计的心脏病发作和中风。

当你看到我们一生中这些
取得重大医学进步的领域时

它们有一个共同点:

科学家发现
了对疾病很重要的分子,

他们开发了检测
和测量体内这些分子的

方法,他们开发
了干扰这些分子的方法。 那些分子

使用其他分子——药物。

这是一个屡屡奏效的策略

但是当涉及到大脑时,
这种策略是有限的,

因为今天,

我们对大脑的工作原理还知之甚少。

我们需要了解我们的哪些细胞
对每种疾病很重要,

以及这些细胞中的哪些分子
对每种疾病很重要。

这就是
我今天要告诉你们的使命。

我的实验室开发了一些
技术,我们试图用这些技术将大脑

变成一个大数据问题。

你看,在我成为生物学家之前,
我从事计算机和数学工作

,我学到了这一课:

只要你能收集

关于系统功能的大量正确数据,

你就可以以强大的新方式使用计算机

了解该系统
并了解其工作原理。

今天,大数据方法
正在改变

我们经济中越来越大的部门,

它们也可以
在生物学和医学领域做同样的事情。

但是您必须
拥有正确类型的数据。

您必须拥有
有关正确事物的数据。

这通常需要
新的技术和想法。

这就是
激励我实验室科学家的使命。

今天,我想告诉你
我们工作中的两个小故事。

在试图将大脑
转变为大数据问题

时,我们面临的一个基本障碍是我们的大脑是
由数十亿个细胞组成的。

我们的细胞不是通才;
他们是专家。

就像工作中的人类一样,

他们专注于数千
种不同的细胞职业

或细胞类型。

事实上,
我们身体中的每一种细胞类型

都可能会就其工作原理进行生动的 TED 演讲

但作为科学家,
我们今天甚至不知道

有多少种细胞类型

,我们也不知道
大多数谈话的标题是什么。

现在,我们知道了
许多关于细胞类型的重要信息。

它们的大小和形状可能有很大差异

一个会对
另一个不响应的分子做出反应,

他们会制造不同的分子。

但科学在很大程度上

是以一种特别的方式获得这些见解,一次一种细胞类型,

一次一种分子。

我们希望能够
快速、系统地学习所有这些内容。

现在,直到最近

,如果您想清点

大脑或任何器官的一部分中的所有分子,

您必须首先将其研磨
成一种细胞冰沙。

但这是个问题。

一旦你磨碎了细胞,

你就只能研究
平均细胞的内容——

而不是单个细胞。

想象一下,如果您试图
了解像纽约这样的大城市是如何运作的,

但您只能
通过查看

有关纽约普通居民的一些统计数据来做到这一点。

当然,你不会学到很多东西,

因为所有有趣
、重要和令人兴奋的东西

都存在于所有的多样性
和专业化中。

我们的细胞也是如此。

我们想让研究大脑成为可能,
而不是作为细胞冰沙,

而是作为细胞水果沙拉,

人们可以在其中生成
关于

每片水果的数据并从中学习。

所以我们开发
了一种技术来做到这一点。

你要去看它的电影了。

在这里,我们将
数以万计的单个

细胞包装成自己的微小水滴,

用于自己的分子分析。

当一个细胞落入液滴中时,
它会受到一个小珠子的欢迎,

而那个珠子会传递数百万
个 DNA 条形码分子。

每个珠子
将不同的条形码序列

传送到不同的单元格。

我们将 DNA 条形码

整合到每个细胞的 RNA 分子中。

那些是
它用来完成工作

的特定基因的分子转录物

然后我们对数十亿
个这些组合分子

进行测序,并使用这些序列告诉我们每个分子来自

哪个细胞和哪个基因

我们将这种方法称为“Drop-seq”,
因为我们使用液滴

来分离细胞以进行分析,

并且我们使用 DNA 序列
来标记和清点

并跟踪所有内容。

现在,每当我们进行实验时,

我们都会分析数以万计
的单个细胞。

今天,在这个科学领域,

越来越多的挑战是
如何

从这些庞大的数据集中尽可能快地学习。

当我们开发 Drop-seq 时,
人们曾经告诉我们,

“哦,这将使
你们成为每个主要大脑项目的首选。”

这不是我们看到的。

当每个人
都在产生大量令人兴奋的数据时,科学才是最好的。

所以我们写了一本 25 页的说明书,

任何科学家都可以用它
从头开始构建自己的 Drop-seq 系统。

在过去的两年中,该说明书
已从我们实验室网站下载了

50,000 次。

我们编写
了任何科学家都可以

用来分析
Drop-seq 实验数据的

软件,而且该软件也是免费的,在过去两年中

,它已从我们的网站下载了
30,000 次。

数百个实验室已经写信给我们
关于他们

使用这种方法所做的发现。

今天,这项技术被
用于制作人体细胞图谱。

它将
是人体所有细胞类型

以及
每种细胞类型用于完成其工作的特定基因的图谱。

现在我想告诉你
我们

在试图将大脑
变成大数据问题时面临的第二个挑战。

这个挑战是
我们想从

数十万活着的人的大脑中学习。

但是
在我们活着的时候,我们的大脑在物理上是无法触及的。

但是,
如果我们不能掌握分子,我们怎么能发现分子因素呢?

答案来自这样一个事实,
即信息量最大的分子

蛋白质编码在我们的 DNA 中,DNA

具有我们的细胞
制造所有蛋白质所遵循的配方。

这些食谱因人而异

,导致蛋白质

在其精确序列

以及每种细胞
类型对每种蛋白质的含量方面因人而异。

这一切都编码在我们的 DNA 中,
而且都是遗传学,

但这不是
我们在学校学到的遗传学。

你还记得大B,小B吗?

如果你继承了大B,你会得到棕色的眼睛?

这很简单。

很少有特质这么简单。

甚至眼睛的颜色也不仅仅是由
一个单一的色素分子决定的。

像我们大脑功能这样复杂的东西

是由
数千个基因的相互作用形成的。

这些基因中的每

一个都因人而异,

而我们每个人都是
这种变异的独特组合。

这是一个大数据的机会。

而今天,越来越

可能以前所未有的规模取得进展。

人们正在以创纪录的数量为基因研究做出贡献,

世界各地的科学家
正在相互分享数据

以加快进展。

我想告诉你一个
关于我们最近发现的

关于精神分裂症遗传学的小故事。

来自 30 个国家的 50,000 人使这成为可能,

他们将自己的 DNA
贡献给了精神分裂症的基因研究。

几年前人们就知道,

人类基因组
对精神分裂症风险的最大影响

来自

编码
我们免疫系统中许多分子的基因组的一部分。

但尚不清楚哪个
基因负责。

我实验室的一位科学家开发
了一种用计算机分析 DNA 的新方法

,他发现了一些
非常令人惊讶的东西。

他发现一种叫做
“补体成分 4”的基因

——简称为“C4”——

在不同人的基因组中以几十种不同的形式出现

,这些不同的形式

在我们的大脑中产生不同数量的 C4 蛋白。

他发现,
我们的基因制造的 C4 蛋白

越多,我们患精神分裂症的风险就越大。

现在,C4 仍然只是
复杂系统中的一个风险因素。

这不是大 B,

但它是
对重要分子的洞察。 长期以来,

C4 等补体蛋白

因其在免疫系统中的作用而闻名,它们在免疫系统

中充当一种
分子便利贴

,上面写着“吃我”。

那张便利贴会贴

在我们体内的大量碎片和死细胞上,

并邀请免疫
细胞消除它们。

但我的两位同事
发现,C4 便利贴

也会贴在大脑的突触上

并促使它们被消除。

现在,突触的产生和消除

是人类发展和学习的正常部分。

我们的大脑一直在创造和消除
突触。

但我们的遗传结果表明
,在精神分裂症中

,消除过程
可能会变得超速。

许多制药公司的科学家告诉我,
他们对这一发现感到兴奋,

因为他们
多年来一直

在研究免疫系统中的补体蛋白,

并且他们已经了解了很多
关于补体蛋白的工作原理。

他们甚至开发
出干扰补体蛋白的分子,

并且开始
在大脑和免疫系统中测试它们。

这可能是一条通往
解决根本原因

而不是单个症状的药物的道路

,我们非常希望
许多科学家多年来的这项工作

能够取得成功。

但 C4 只是

数据驱动的
科学方法

在数百年历史的医学问题上开辟新战线的潜力的一个例子

我们的基因组

中有数百个位置决定了脑部疾病的风险,

其中任何一个位置都可以引导我们
对重要分子进行下

一个分子洞察。

并且有数百种细胞类型
以不同的组合使用这些基因。

当我们和其他
科学家努力

生成所需的其余数据

并从这些数据中学习我们所能做的一切时,

我们希望开辟更多新的前沿。

遗传学和单细胞分析
只是

试图将大脑
变成大数据问题的两种方法。

我们能做的还有很多。

我实验室的科学家
正在开发一种技术,

用于快速映射
大脑中的突触连接,

以判断哪些神经元正在
与哪些其他神经元

对话,以及这种对话
在整个生命和疾病期间如何变化。

我们正在开发一种方法
来测试具有数百

个不同人基因组的细胞

如何对相同的刺激做出不同的反应。

这些项目汇集了
具有不同背景

、培训和兴趣的人——

生物学、计算机、化学、
数学、统计学、工程学。

但科学的可能性将
具有不同兴趣的人们

聚集在一起,紧密合作。

我们希望创造的未来是什么?

考虑癌症。

我们已经从一个
对导致癌症的原因

一无所知的时代

转变为
对癌症真正生物学原因的现代分子理解。

今天的这种理解
导致了创新医学

之后的创新医学

,尽管
还有很多工作要做,

但我们已经被
那些在一代人之前被认为无法治愈的癌症治愈的人所包围

。 像

我姐姐一样的数百万癌症幸存者

发现自己拥有了多年的生活
,他们并不认为是理所当然的,

并获得

了工作、快乐和人际关系的新机会。

这就是我们
决心围绕精神疾病创造的未来

——真正的理解、同理心

和无限的可能性。

谢谢你。

(掌声)