The next software revolution programming biological cells SaraJane Dunn

The second half of the last century
was completely defined

by a technological revolution:

the software revolution.

The ability to program electrons
on a material called silicon

made possible technologies,
companies and industries

that were at one point
unimaginable to many of us,

but which have now fundamentally changed
the way the world works.

The first half of this century, though,

is going to be transformed
by a new software revolution:

the living software revolution.

And this will be powered by the ability
to program biochemistry

on a material called biology.

And doing so will enable us to harness
the properties of biology

to generate new kinds of therapies,

to repair damaged tissue,

to reprogram faulty cells

or even build programmable
operating systems out of biochemistry.

If we can realize this –
and we do need to realize it –

its impact will be so enormous

that it will make the first
software revolution pale in comparison.

And that’s because living software
would transform the entirety of medicine,

agriculture and energy,

and these are sectors that dwarf
those dominated by IT.

Imagine programmable plants
that fix nitrogen more effectively

or resist emerging fungal pathogens,

or even programming crops
to be perennial rather than annual

so you could double
your crop yields each year.

That would transform agriculture

and how we’ll keep our growing
and global population fed.

Or imagine programmable immunity,

designing and harnessing molecular devices
that guide your immune system

to detect, eradicate
or even prevent disease.

This would transform medicine

and how we’ll keep our growing
and aging population healthy.

We already have many of the tools
that will make living software a reality.

We can precisely edit genes with CRISPR.

We can rewrite the genetic code
one base at a time.

We can even build functioning
synthetic circuits out of DNA.

But figuring out how and when
to wield these tools

is still a process of trial and error.

It needs deep expertise,
years of specialization.

And experimental protocols
are difficult to discover

and all too often, difficult to reproduce.

And, you know, we have a tendency
in biology to focus a lot on the parts,

but we all know that something like flying
wouldn’t be understood

by only studying feathers.

So programming biology is not yet
as simple as programming your computer.

And then to make matters worse,

living systems largely bear no resemblance
to the engineered systems

that you and I program every day.

In contrast to engineered systems,
living systems self-generate,

they self-organize,

they operate at molecular scales.

And these molecular-level interactions

lead generally to robust
macro-scale output.

They can even self-repair.

Consider, for example,
the humble household plant,

like that one sat
on your mantelpiece at home

that you keep forgetting to water.

Every day, despite your neglect,
that plant has to wake up

and figure out how
to allocate its resources.

Will it grow, photosynthesize,
produce seeds, or flower?

And that’s a decision that has to be made
at the level of the whole organism.

But a plant doesn’t have a brain
to figure all of that out.

It has to make do
with the cells on its leaves.

They have to respond to the environment

and make the decisions
that affect the whole plant.

So somehow there must be a program
running inside these cells,

a program that responds
to input signals and cues

and shapes what that cell will do.

And then those programs must operate
in a distributed way

across individual cells,

so that they can coordinate
and that plant can grow and flourish.

If we could understand
these biological programs,

if we could understand
biological computation,

it would transform our ability
to understand how and why

cells do what they do.

Because, if we understood these programs,

we could debug them when things go wrong.

Or we could learn from them how to design
the kind of synthetic circuits

that truly exploit
the computational power of biochemistry.

My passion about this idea
led me to a career in research

at the interface of maths,
computer science and biology.

And in my work, I focus on the concept
of biology as computation.

And that means asking
what do cells compute,

and how can we uncover
these biological programs?

And I started to ask these questions
together with some brilliant collaborators

at Microsoft Research
and the University of Cambridge,

where together we wanted to understand

the biological program
running inside a unique type of cell:

an embryonic stem cell.

These cells are unique
because they’re totally naïve.

They can become anything they want:

a brain cell, a heart cell,
a bone cell, a lung cell,

any adult cell type.

This naïvety, it sets them apart,

but it also ignited the imagination
of the scientific community,

who realized, if we could
tap into that potential,

we would have a powerful
tool for medicine.

If we could figure out
how these cells make the decision

to become one cell type or another,

we might be able to harness them

to generate cells that we need
to repair diseased or damaged tissue.

But realizing that vision
is not without its challenges,

not least because these particular cells,

they emerge just six days
after conception.

And then within a day or so, they’re gone.

They have set off down the different paths

that form all the structures
and organs of your adult body.

But it turns out that cell fates
are a lot more plastic

than we might have imagined.

About 13 years ago, some scientists
showed something truly revolutionary.

By inserting just a handful of genes
into an adult cell,

like one of your skin cells,

you can transform that cell
back to the naïve state.

And it’s a process that’s actually
known as “reprogramming,”

and it allows us to imagine
a kind of stem cell utopia,

the ability to take a sample
of a patient’s own cells,

transform them back to the naïve state

and use those cells to make
whatever that patient might need,

whether it’s brain cells or heart cells.

But over the last decade or so,

figuring out how to change cell fate,

it’s still a process of trial and error.

Even in cases where we’ve uncovered
successful experimental protocols,

they’re still inefficient,

and we lack a fundamental understanding
of how and why they work.

If you figured out how to change
a stem cell into a heart cell,

that hasn’t got any way of telling you
how to change a stem cell

into a brain cell.

So we wanted to understand
the biological program

running inside an embryonic stem cell,

and understanding the computation
performed by a living system

starts with asking
a devastatingly simple question:

What is it that system actually has to do?

Now, computer science actually
has a set of strategies

for dealing with what it is the software
and hardware are meant to do.

When you write a program,
you code a piece of software,

you want that software to run correctly.

You want performance, functionality.

You want to prevent bugs.

They can cost you a lot.

So when a developer writes a program,

they could write down
a set of specifications.

These are what your program should do.

Maybe it should compare
the size of two numbers

or order numbers by increasing size.

Technology exists that allows us
automatically to check

whether our specifications are satisfied,

whether that program
does what it should do.

And so our idea was that in the same way,

experimental observations,
things we measure in the lab,

they correspond to specifications
of what the biological program should do.

So we just needed to figure out a way

to encode this new type of specification.

So let’s say you’ve been busy in the lab
and you’ve been measuring your genes

and you’ve found that if Gene A is active,

then Gene B or Gene C seems to be active.

We can write that observation down
as a mathematical expression

if we can use the language of logic:

If A, then B or C.

Now, this is a very simple example, OK.

It’s just to illustrate the point.

We can encode truly rich expressions

that actually capture the behavior
of multiple genes or proteins over time

across multiple different experiments.

And so by translating our observations

into mathematical expression in this way,

it becomes possible to test whether
or not those observations can emerge

from a program of genetic interactions.

And we developed a tool to do just this.

We were able to use this tool
to encode observations

as mathematical expressions,

and then that tool would allow us
to uncover the genetic program

that could explain them all.

And we then apply this approach

to uncover the genetic program
running inside embryonic stem cells

to see if we could understand
how to induce that naïve state.

And this tool was actually built

on a solver that’s deployed
routinely around the world

for conventional software verification.

So we started with a set
of nearly 50 different specifications

that we generated from experimental
observations of embryonic stem cells.

And by encoding these
observations in this tool,

we were able to uncover
the first molecular program

that could explain all of them.

Now, that’s kind of a feat
in and of itself, right?

Being able to reconcile
all of these different observations

is not the kind of thing
you can do on the back of an envelope,

even if you have a really big envelope.

Because we’ve got
this kind of understanding,

we could go one step further.

We could use this program to predict
what this cell might do

in conditions we hadn’t yet tested.

We could probe the program in silico.

And so we did just that:

we generated predictions
that we tested in the lab,

and we found that this program
was highly predictive.

It told us how we could
accelerate progress

back to the naïve state
quickly and efficiently.

It told us which genes
to target to do that,

which genes might even
hinder that process.

We even found the program predicted
the order in which genes would switch on.

So this approach really allowed us
to uncover the dynamics

of what the cells are doing.

What we’ve developed, it’s not a method
that’s specific to stem cell biology.

Rather, it allows us to make sense
of the computation

being carried out by the cell

in the context of genetic interactions.

So really, it’s just one building block.

The field urgently needs
to develop new approaches

to understand biological
computation more broadly

and at different levels,

from DNA right through
to the flow of information between cells.

Only this kind of
transformative understanding

will enable us to harness biology
in ways that are predictable and reliable.

But to program biology,
we will also need to develop

the kinds of tools and languages

that allow both experimentalists
and computational scientists

to design biological function

and have those designs compile down
to the machine code of the cell,

its biochemistry,

so that we could then
build those structures.

Now, that’s something akin
to a living software compiler,

and I’m proud to be
part of a team at Microsoft

that’s working to develop one.

Though to say it’s a grand challenge
is kind of an understatement,

but if it’s realized,

it would be the final bridge
between software and wetware.

More broadly, though, programming biology
is only going to be possible

if we can transform the field
into being truly interdisciplinary.

It needs us to bridge
the physical and the life sciences,

and scientists from
each of these disciplines

need to be able to work together
with common languages

and to have shared scientific questions.

In the long term, it’s worth remembering
that many of the giant software companies

and the technology
that you and I work with every day

could hardly have been imagined

at the time we first started
programming on silicon microchips.

And if we start now to think about
the potential for technology

enabled by computational biology,

we’ll see some of the steps
that we need to take along the way

to make that a reality.

Now, there is the sobering thought
that this kind of technology

could be open to misuse.

If we’re willing to talk
about the potential

for programming immune cells,

we should also be thinking
about the potential of bacteria

engineered to evade them.

There might be people willing to do that.

Now, one reassuring thought in this

is that – well, less so
for the scientists –

is that biology is
a fragile thing to work with.

So programming biology
is not going to be something

you’ll be doing in your garden shed.

But because we’re at the outset of this,

we can move forward
with our eyes wide open.

We can ask the difficult
questions up front,

we can put in place
the necessary safeguards

and, as part of that,
we’ll have to think about our ethics.

We’ll have to think about putting bounds
on the implementation

of biological function.

So as part of this, research in bioethics
will have to be a priority.

It can’t be relegated to second place

in the excitement
of scientific innovation.

But the ultimate prize,
the ultimate destination on this journey,

would be breakthrough applications
and breakthrough industries

in areas from agriculture and medicine
to energy and materials

and even computing itself.

Imagine, one day we could be powering
the planet sustainably

on the ultimate green energy

if we could mimic something
that plants figured out millennia ago:

how to harness the sun’s energy
with an efficiency that is unparalleled

by our current solar cells.

If we understood that program
of quantum interactions

that allow plants to absorb
sunlight so efficiently,

we might be able to translate that
into building synthetic DNA circuits

that offer the material
for better solar cells.

There are teams and scientists working
on the fundamentals of this right now,

so perhaps if it got the right attention
and the right investment,

it could be realized in 10 or 15 years.

So we are at the beginning
of a technological revolution.

Understanding this ancient type
of biological computation

is the critical first step.

And if we can realize this,

we would enter in the era
of an operating system

that runs living software.

Thank you very much.

(Applause)

上世纪下半叶
完全

由一场技术革命定义

：软件革命。

在一种称为硅的材料上对电子进行编程的能力

使技术、公司和行业成为可能，这些技术、
公司和

行业曾经
对我们许多人来说是不可想象的，

但现在已经从根本上
改变了世界的运作方式。

然而，本世纪上半

叶将
被一场新的软件革命所改变

：活生生的软件革命。

这将由

在一种称为生物学的材料上编程生物化学的能力提供支持。

这样做将使我们能够
利用生物学的特性

来产生新的疗法

，修复受损的组织，

重新编程有缺陷的细胞

，甚至利用生物化学构建可编程的
操作系统。

如果我们能够意识到这一点
——我们确实需要意识到这一点——

它的影响将如此巨大

，以至于它将使第一次
软件革命相形见绌。

那是因为活的软件
将改变整个医药、

农业和能源领域，

而这些领域让
那些以 IT 为主导的领域相形见绌。

想象一下可以
更有效地固氮

或抵抗新出现的真菌病原体的可编程植物，

甚至可以将作物编程
为多年生而不是一年生，

这样
您每年的作物产量就可以翻一番。

这将改变农业

，以及我们将如何养活不断增长
的全球人口。

或者想象可编程免疫，

设计和利用分子设备
来指导您的免疫

系统检测、根除
甚至预防疾病。

这将改变医学

以及我们如何保持不断增长
和老龄化的人口健康。

我们已经拥有许多
可以让生活软件成为现实的工具。

我们可以使用 CRISPR 精确编辑基因。

我们可以
一次一个碱基地改写遗传密码。

我们甚至可以用 DNA 构建功能性
合成电路。

但弄清楚如何以及
何时使用这些工具

仍然是一个反复试验的过程。

它需要深厚的专业知识，
多年的专业化。

实验
方案很难发现

，而且经常难以复制。

而且，您知道，我们
在生物学中倾向于将重点放在零件上，

但是我们都知道仅通过研究羽毛是无法理解诸如飞行之类的事情的

。

因此，对生物学进行编程并不
像对计算机进行编程那么简单。

更糟糕的是，

生命系统在很大程度上与

你我每天编程的工程系统没有任何相似之处。

与工程系统相比，
生命系统自生成

、自组织

、在分子尺度上运行。

这些分子水平的相互作用

通常会导致强大的
宏观规模输出。

他们甚至可以自我修复。

例如，考虑一下
不起眼的家用植物，

就像
你家壁炉架上的

那种植物，你总是忘记浇水。

每天，尽管您忽略了，
该工厂必须醒来

并弄清楚
如何分配其资源。

它会生长、光合作用、
产生种子还是开花？

这是一个必须
在整个有机体层面做出的决定。

但是植物没有大脑
来解决所有这些问题。

它与叶子上的细胞有关。

他们必须对环境做出反应

并
做出影响整个工厂的决定。

因此，不知何故，必须有一个程序
在这些单元内运行，

一个
响应输入信号并提示

和塑造该单元将做什么的程序。

然后这些程序必须

在各个细胞之间以分布式方式运行，

这样它们才能协调
，植物才能生长和繁荣。

如果我们能够理解
这些生物程序，

如果我们能够理解
生物计算，

它将改变我们
理解细胞如何以及

为什么做它们所做的事情的能力。

因为，如果我们了解这些程序，

我们就可以在出现问题时调试它们。

或者我们可以向他们学习如何设计

真正
利用生物化学计算能力的合成电路。

我对这个想法的热情
使我从事

数学、
计算机科学和生物学接口的研究工作。

在我的工作中，我专注于将
生物学作为计算的概念。

这意味着要问
细胞计算什么

，我们如何才能发现
这些生物程序？

我开始
与

微软研究院
和剑桥大学

的一些杰出合作者一起提出这些问题，我们想一起了解

在一种独特的细胞类型中运行的生物学程序

：胚胎干细胞。

这些细胞是独一无二的，
因为它们完全天真。

它们可以变成任何想要的东西

：脑细胞、心脏细胞
、骨细胞、肺细胞、

任何成体细胞类型。

这种天真使他们与众不同，

但也激发
了科学界的想象力，

他们意识到，如果我们能够
利用这种潜力，

我们将拥有强大
的医学工具。

如果我们能够
弄清楚这些细胞是如何

决定成为一种或另一种细胞类型的，

我们或许能够利用它们

来产生我们
需要修复患病或受损组织的细胞。

但要意识到这一愿景
并非没有挑战，

尤其是因为这些特殊的细胞，

它们在受孕后仅六天就出现了
。

然后在一天左右的时间内，它们就消失了。

他们已经开始

了形成
你成人身体所有结构和器官的不同路径。

但事实证明，细胞
命运的可塑性

比我们想象的要大得多。

大约 13 年前，一些科学家
展示了一些真正具有革命性的东西。

通过将少量基因
插入一个成年细胞，

比如你的一个皮肤细胞，

你可以将该细胞
转回幼稚状态。

这是一个实际上
被称为“重新编程”的过程

，它让我们可以想象
一种干细胞乌托邦，

即能够采集
患者自身细胞的样本，

将它们转化回幼稚状态，

并使用这些细胞制造
无论病人需要什么，

无论是脑细胞还是心脏细胞。

但在过去十年左右的时间里，

弄清楚如何改变细胞命运

，仍然是一个反复试验的过程。

即使在我们发现了
成功的实验方案的情况下，

它们仍然效率低下，

而且我们对
它们的工作方式和原因缺乏基本的了解。

如果你知道如何
将干细胞转化为心脏细胞，

那也无法告诉你
如何将干细胞

转化为脑细胞。

因此，我们想了解
在

胚胎干细胞内运行的生物程序

，了解
生命系统执行的计算

首先要问
一个极其简单的问题

：系统实际上必须做什么？

现在，计算机科学实际上
有一套策略

来处理软件
和硬件的目的。

当您编写程序时，
您编写了一个软件，

您希望该软件能够正确运行。

你想要性能，功能。

你想防止错误。

他们可能会花费你很多。

因此，当开发人员编写程序时，

他们可以写下
一组规范。

这些是你的程序应该做的。

也许它应该通过增加
大小来比较两个数字

或订单号的大小。

现有的技术允许我们
自动检查

我们的规范是否得到满足，

该程序
是否做了它应该做的事情。

所以我们的想法是，以同样的方式，

实验观察，
我们在实验室测量的东西，

它们对应于
生物程序应该做什么的规范。

所以我们只需要找出一种方法

来编码这种新型规范。

所以假设你在实验室很忙
，你一直在测量你的基因

，你发现如果基因 A 是活跃的，

那么基因 B 或基因 C 似乎也是活跃的。如果

我们可以使用逻辑语言，我们可以将观察结果
写成数学表达式

：

如果 A，那么 B 或 C。

现在，这是一个非常简单的例子，好的。

这只是为了说明这一点。

我们可以编码真正丰富的表达

，这些表达实际上在多个不同的实验中捕获
多个基因或蛋白质随时间的行为

。

因此，通过以这种方式将我们的观察结果

转化为数学表达式，

就有可能测试
这些观察结果是否可以

从遗传相互作用程序中产生。

我们开发了一个工具来做到这一点。

我们能够使用该工具
将观察结果编码

为数学表达式，

然后该工具将使我们
能够

揭示可以解释它们的基因程序。

然后我们应用这种方法

来揭示
在胚胎干细胞内运行的遗传程序

，看看我们是否能够理解
如何诱导这种幼稚状态。

这个工具实际上是建立

在一个求解器上的，该求解器
在世界各地常规部署，

用于传统的软件验证。

因此，我们从一
组近 50 种不同的规格开始，这些规格

是我们从
胚胎干细胞的实验观察中产生的。

通过将这些
观察结果编码到这个工具中，

我们能够
发现第一个

可以解释所有这些观察结果的分子程序。

现在，这
本身就是一项壮举，对吧？

能够调和
所有这些不同的观察

结果并不是
你可以在信封背面做的事情，

即使你有一个非常大的信封。

因为有了
这样的认识，

我们才能更进一步。

我们可以使用这个程序来预测
这个细胞

在我们尚未测试的条件下可能会做什么。

我们可以在计算机上探测该程序。

所以我们这样做了：

我们生成
了我们在实验室测试过的预测

，我们发现这个
程序具有高度的预测性。

它告诉我们如何快速有效
地加速

回到幼稚状态
。

它告诉我们
要针对哪些基因来做到这一点，

哪些基因甚至可能
阻碍这一过程。

我们甚至发现该程序预测
了基因开启的顺序。

因此，这种方法确实让我们
能够揭示

细胞正在做什么的动态。

我们开发的并不是
干细胞生物学特有的方法。

相反，它使我们能够理解

细胞

在遗传相互作用的背景下进行的计算。

所以真的，它只是一个构建块。

该领域迫切
需要开发新的方法

来
更广泛地理解生物计算，

并在不同的层次上，

从 DNA 一直
到细胞之间的信息流。

只有这种
变革性的理解

才能使我们能够以
可预测和可靠的方式利用生物学。

但要对生物学进行编程，
我们还需要

开发各种工具和语言

，让实验者
和计算科学家

都能设计生物学功能，

并将这些设计编译
成细胞的机器代码，即细胞

的生物化学，

这样我们就可以
建造那些结构。

现在，这
类似于一个活生生的软件编译器

，我很自豪能
成为微软团队的一员，该团队

正在努力开发一个。

虽然说这是一个巨大的
挑战有点轻描淡写，

但如果它成为现实，

它将成为
软件和湿件之间的最后一座桥梁。

不过，更广泛地说，
只有

当我们能够将该领域
转变为真正的跨学科时，编程生物学才有可能实现。

它需要我们
在物理和生命科学之间架起一座桥梁，

来自
各个学科的科学家

需要能够
使用共同的语言一起工作，

并有共同的科学问题。

从长远来看，值得记住的
是，在我们第一次开始在硅微芯片上编程时，我们几乎无法想象许多大型软件公司

和
你我每天使用的技术

。

如果我们现在开始思考计算生物学
所带来的技术潜力

，

我们将看到在实现这一目标的过程

中需要采取的一些步骤。

现在，有一个发人深省的想法
是，这种技术

可能会被滥用。

如果我们愿意
谈论对

免疫细胞进行编程的潜力，

我们还应该考虑被

设计用来逃避它们的细菌的潜力。

可能有人愿意这样做。

现在，一个令人欣慰的想法

是——嗯，
对科学家来说

不是这样——生物学是
一件很脆弱的事情。

所以编程
生物学不会

是你在花园棚子里做的事情。

但是因为我们处于起步阶段，所以

我们可以
睁大眼睛继续前进。

我们可以
提前提出难题，

我们可以
采取必要的保障措施

，作为其中的一部分，
我们必须考虑我们的道德规范。

我们将不得不考虑限制

生物功能的实现。

因此，作为其中的一部分，生物
伦理学研究必须成为优先事项。

在
科学创新的兴奋中，它不能退居次席。

但最终的奖品，
这一旅程的最终目的地，

将是

从农业和医学
到能源和材料

甚至计算本身领域的突破性应用和突破性行业。

想象一下，

如果我们能
模仿几千年前植物发现的东西：

如何

以我们目前的太阳能电池无法比拟的效率利用太阳能，有一天我们可以用终极绿色能源为地球提供可持续的动力。

如果我们了解

让植物
如此有效地吸收阳光的量子相互作用程序，

我们或许能够将其
转化为构建合成 DNA 电路

，
为更好的太阳能电池提供材料。

现在有团队和科学家在
研究这方面的基础，

所以如果得到正确的关注
和正确的投资，

它可能会在 10 或 15 年内实现。

因此，我们正
处于技术革命的开端。

了解这种古老
的生物计算类型

是关键的第一步。

如果我们能意识到这一点，

我们将进入

一个运行活软件的操作系统时代。

非常感谢你。

（掌声）