How to sequence the human genome Mark J. Kiel

You’ve probably heard of the human genome,

the huge collection of genes

inside each and every one of your cells.

You probably also know

that we’ve sequenced the human genome,

but what does that actually mean?

How do you sequence someone’s genome?

Let’s back up a bit.

What is a genome?

Well, a genome is all the genes plus some extra

that make up an organism.

Genes are made up of DNA,

and DNA is made up of long, paired strands

of A’s,

T’s,

C’s,

and G’s.

Your genome is the code

that your cells use to know how to behave.

Cells interacting together make tissues.

Tissues cooperating with each other make organs.

Organs cooperating with each other

make an organism,

you!

So, you are who you are

in large part because of your genome.

The first human genome

was sequenced ten years ago

and was no easy task.

It took two decades to complete,

required the effort of hundreds of scientists

across dozens of countries,

and cost over three billion dollars.

But some day very soon,

it will be possible to know the sequence of letters

that make up your own personal genome

all in a matter of minutes

and for less than the cost

of a pretty nice birthday present.

How is that possible?

Let’s take a closer look.

Knowing the sequence of the billions of letters

that make up your genome

is the goal of genome sequencing.

A genome is both really, really big

and very, very small.

The individual letters of DNA,

the A’s, T’s, G’s, and C’s,

are only eight or ten atoms wide,

and they’re all packed together into a clump,

like a ball of yarn.

So, to get all that information

out of that tiny space,

scientists first have to break

the long string of DNA down into smaller pieces.

Each of these pieces is then separated in space

and sequenced individually,

but how?

It’s helpful to remember

that DNA binds to other DNA

if the sequences are the exact opposite of each other.

A’s bind to T’s,

and T’s bind to A’s.

G’s bind to C’s,

and C’s to G’s.

If the A-T-G-C sequence of two pieces of DNA

are exact opposites,

they stick together.

Because the genome pieces

are so very small,

we need some way to increase

the signal we can detect

from each of the individual letters.

In the most common method,

scientists use enzymes to make thousands of copies

of each genome piece.

So, we now have thousands of replicas

of each of the genome pieces,

all with the same sequence

of A’s, T’s, G’s, and C’s.

But we have to read them all somehow.

To do this, we need to make

a batch of special letters,

each with a distinct color.

A mixture of these special colored letters and enzymes

are then added to the genome

we’re trying to read.

At each spot on the genome,

one of the special letters

binds to its opposite letter,

so we now have a double-stranded piece of DNA

with a colorful spot at each letter.

Scientists then take pictures

of each snippet of genome.

Seeing the order of the colors

allows us to read the sequence.

The sequences of each

of these millions of pieces of DNA

are stitched together using computer programs

to create a complete sequence of the entire genome.

This isn’t the only way

to read the letter sequences of pieces of DNA,

but it’s one of the most common.

Of course, just reading the letters in the genome

doesn’t tell us much.

It’s kind of like looking through a book

written in a language you don’t speak.

You can recognize all the letters

but still have no idea what’s going on.

So, the next step is to decipher

what the sequence means,

how your genome and my genome are different.

Interpreting the genes of the genome

is the part scientists are still working on.

While not every difference is consequential,

the sum of these differences

is responsible for differences

in how we look,

what we like,

how we act,

and even how likely we are to get sick

or respond to specific medicines.

Better understanding of how disparities

between our genomes

account for these differences

is sure to change the way we think

not only about how doctors treat their patients,

but also how we treat each other.

您可能听说过人类基因组,

您的每个细胞中的大量基因。

您可能还

知道我们已经对人类基因组进行了测序,

但这实际上意味着什么?

你如何对某人的基因组进行测序?

让我们备份一下。

什么是基因组?

嗯,基因组是构成生物体的所有基因加上一些额外的基因

基因由 DNA 组成,

而 DNA 由

A、

T、

C

和 G 的长成对链组成。

你的基因组

是你的细胞用来知道如何表现的代码。

细胞相互作用形成组织。

相互合作的组织构成器官。

器官相互合作,

构成一个有机体,

你!

所以,你之所以成为你

,很大程度上是因为你的基因组。

第一个人类基因组

是十年前测序的

,这绝非易事。

花了二十年才完成,

需要数十个国家的数百名科学家的努力

,耗资超过 30 亿美元。

但是很快的某一天

,将有可能在几分钟内就

知道组成你自己的个人基因组的字母序列,

而且花费的成本还不到

一件漂亮的生日礼物。

这怎么可能?

让我们仔细看看。

了解构成基因组的数十亿个字母的序列

是基因组测序的目标。

基因组既非常非常大,

又非常非常小。

DNA 的各个字母

,A、T、G 和 C

,只有八到十个原子宽

,它们都挤

成一团,就像一团毛线。

因此,为了

从那个狭小的空间中获取所有信息,

科学家们首先必须

将一长串 DNA 分解成更小的片段。

然后将这些片段中的每一个在空间中分开

并单独排序,

但是如何?

如果序列彼此完全相反,

记住 DNA 会与其他 DNA 结合会很有帮助

A绑定到T

,T绑定到A。

G 与 C 绑定,C 与 G 绑定

如果两条 DNA 的 A-T-G-C 序列

完全相反,

它们就会粘在一起。

因为基因组片段

非常小,

我们需要一些方法来增加

我们可以

从每个单独的字母中检测到的信号。

在最常见的方法中,

科学家使用酶来制作

每个基因组片段的数千个副本。

因此,我们现在拥有

每个基因组片段的数千个复制品,

所有复制品都具有相同

的 A、T、G 和 C 序列。

但我们必须以某种方式阅读它们。

为此,我们需要制作

一批特殊字母,

每个字母都有不同的颜色。

然后将这些特殊颜色的字母和酶

的混合物添加到我们试图读取的基因组中

在基因组上的每个点上,

一个特殊字母

与其相反的字母结合,

所以我们现在有一个双链 DNA

,每个字母上都有一个彩色点。

然后,科学家们

拍摄每个基因组片段的照片。

看到颜色的顺序

可以让我们阅读顺序。

这些数以百万计的 DNA 片段中的每一个的序列

都使用计算机程序缝合在一起,

以创建整个基因组的完整序列。

这不是

读取 DNA 片段字母序列的唯一方法,

但它是最常见的方法之一。

当然,仅仅阅读基因组中的字母

并不能告诉我们太多。

这有点像翻阅一本

用你不会说的语言写的书。

您可以识别所有字母,

但仍然不知道发生了什么。

所以,下一步是

破译序列的含义,

你的基因组和我的基因组有何不同。

解释基因组的基因

是科学家们仍在研究的部分。

虽然并非所有差异都是必然的,

但这些差异的总和会

导致我们的外表、

我们喜欢什么、

我们的行为方式,

甚至我们生病

或对特定药物反应的可能性有多大差异。

更好地了解

我们的基因组之间的差异如何

解释这些

差异,肯定会改变

我们对医生如何对待病人

以及我们如何对待彼此的看法。