The race to sequence the human genome Tien Nguyen

Packed inside every cell in your body
is a set of genetic instructions,

3.2 billion base pairs long.

Deciphering these directions
would be a monumental task

but could offer unprecedented insight
about the human body.

In 1990, a consortium
of 20 international research centers

embarked on the world’s largest
biological collaboration

to accomplish this mission.

The Human Genome Project proposed
to sequence the entire human genome

over 15 years
with $3 billion of public funds.

Then, seven years
before its scheduled completion,

a private company called Celera announced
that they could accomplish the same goal

in just three years
and at a fraction of the cost.

The two camps discussed a joint venture,
but talks quickly fell apart

as disagreements arose over legal
and ethical issues of genetic property.

And so the race began.

Though both teams used the same technology
to sequence the entire human genome,

it was their strategies
that made all the difference.

Their paths diverged
in the most critical of steps:

the first one.

In the Human Genome Project’s approach,

the genome was first divided into smaller,
more manageable chunks

about 150,000 base pairs long

that overlapped each other
a little bit on both ends.

Each of these fragments of DNA

was inserted inside a bacterial
artificial chromosome

where they were cloned and fingerprinted.

The fingerprints showed scientists
where the fragments overlapped

without knowing the actual sequence.

Using the overlapping bits as a guide,

the researchers marked
each fragment’s place in the genome

to create a contiguous map,

a process that took about six years.

The cloned fragments were sequenced
in labs around the world

following one of the project’s
two major principles:

that collaboration on our shared heritage
was open to all nations.

In each case, the fragments
were arbitrarily broken up

into small, overlapping pieces
about 1,000 base pairs long.

Then, using a technology
called the Sanger method,

each piece was sequenced letter by letter.

This rigorous map-based approach
called hierarchical shotgun sequencing

minimized the risk of misassembly,

a huge hazard of sequencing genomes
with many repetitive portions,

like the human genome.

The consortium’s
“better safe than sorry” approach

contrasted starkly with Celera’s strategy
called whole genome shotgun sequencing.

It hinged on skipping
the mapping phase entirely,

a faster, though foolhardy, approach
according to some.

The entire genome was directly chopped up

into a giant heap
of small, overlapping bits.

Once these bits were sequenced
via the Sanger method,

Celera would take the formidable risk
of reconstructing the genome

using just the overlaps.

But perhaps their decision
wasn’t such a gamble

because guess whose freshly completed map
was available online for free?

The Human Genome Consortium,

in accordance with
the project’s second major principle

which held that all of the project’s data

would be shared publicly
within 24 hours of collection.

So in 1998, scientists around the world

were furiously sequencing
lines of genetic code

using the tried and true, yet laborious,
Sanger method.

Finally, after three exhausting years
of continuous sequencing and assembling,

the verdict was in.

In February 2001, both groups
simultaneously published

working drafts of more than 90%
of the human genome,

several years ahead
of the consortium’s schedule.

The race ended in a tie.

The Human Genome Project’s practice
of immediately sharing its data

was an unusual one.

It is more typical for scientists
to closely guard their data

until they are able to analyze it
and publish their conclusions.

Instead, the Human Genome Project
accelerated the pace of research

and created an international
collaboration on an unprecedented scale.

Since then, robust investment in both
the public and private sector

has led to the identification
of many disease related genes

and remarkable advances
in sequencing technology.

Today, a person’s genome can be sequenced
in just a few days.

However, reading the genome
is only the first step.

We’re a long way away from understanding
what most of our genes do

and how they are controlled.

Those are some of the challenges

for the next generation
of ambitious research initiatives.

在你身体的每个细胞里都装着
一套基因指令,

长 32 亿个碱基对。

破译这些方向
将是一项艰巨的任务,

但可以提供
对人体的前所未有的洞察力。

1990 年,一个由
20 个国际研究中心组成的联盟

开始了世界上最大的
生物合作,

以完成这一使命。

人类基因组计划提议

用 30 亿美元的公共资金在 15 年内对整个人类基因组进行测序。

然后,
在计划完成的七年前,

一家名为 Celera 的私人公司宣布
,他们可以

在短短三年
内以极少的成本实现相同的目标。

两个阵营讨论了一个合资企业,

由于在
遗传财产的法律和伦理问题上出现分歧,谈判很快就破裂了。

于是比赛开始了。

尽管两个团队都使用相同的技术
对整个人类基因组进行测序,

但正是他们的策略
使一切变得不同。

他们的路径
在最关键的步骤中分道扬镳

:第一步。

在人类基因组计划的方法中

,基因组首先被分成更小、
更易于管理的块,

长约 150,000 个碱基对

,两端相互重叠

这些 DNA 片段中的每一个都

被插入到细菌
人工染色体中

,在那里它们被克隆和指纹识别。

指纹向科学家
展示了片段重叠的位置,

但不知道实际序列。

使用重叠位作为指导

,研究人员标记了
每个片段在基因组中的位置,

以创建一个连续的地图,

这个过程大约需要六年时间。

克隆片段
在世界各地的实验室中

按照该项目的
两个主要原则之一进行测序:

我们共同遗产的合作
对所有国家开放。

在每种情况下,片段
都被任意分解


大约 1000 个碱基对长的小重叠片段。

然后,使用一种
称为 Sanger 方法的技术,

每个字母一个字母地排序。

这种称为分层鸟枪测序的严格的基于图谱的方法

错误组装的风险降至最低,

这是对
具有许多重复部分

的基因组(如人类基因组)进行测序的巨大危险。

该联盟的
“比后悔更安全”的方法

与 Celera 的
称为全基因组鸟枪法测序的策略形成鲜明对比。

它取决于
完全跳过映射阶段

,根据一些人的说法,这是一种更快但很鲁莽的方法

整个基因组被直接分割

成一大堆
重叠的小块。

一旦
通过 Sanger 方法对这些位进行测序,

Celera 将冒着仅使用重叠
部分来重建基因组的巨大风险

但也许他们的决定
并不是一场赌博,

因为猜猜谁新完成的
地图可以免费在线获得?

人类基因组

联盟根据
该项目的第二个主要原则

,即所有项目的数据


在收集后 24 小时内公开共享。

因此,在 1998 年,世界各地的科学家们都在

疯狂地

使用经过验证但又费力的
Sanger 方法对遗传密码进行测序。

终于,经过三年累人
的连续测序和组装,终于做出

了裁决

。2001 年 2 月,两个小组
同时发表

了超过 90%
的人类基因组的工作草案,比

该联盟的计划提前了几年。

比赛以平局告终。

人类基因组计划
立即共享其数据的做法

是一种不同寻常的做法。

科学家们更典型的做法是
密切保护他们的数据,

直到他们能够分析它
并发表他们的结论。

相反,人类基因组计划
加快了研究步伐

,创造了前所未有的国际
合作。

从那时起,
对公共和私营部门

的大力投资导致
了许多疾病相关基因的鉴定


测序技术的显着进步。

今天,一个人的基因组可以
在短短几天内完成测序。

然而,读取基因组
只是第一步。

我们离了解
我们大多数基因的作用

以及它们是如何被控制的还有很长的路要走。

这些是

下一代雄心勃勃的研究计划面临的一些挑战。