Marching to Our Own Sequences

Study finds DNA replication timing varies among people

Steven McCarroll and Amnon Koren describe their surprising findings and how they made their discovery by tapping into an existing online database of genome sequencing data. Video: Rick Groleau and Stephanie Dutchen

Imagine being asked to copy a library of books. Doing it yourself would take forever. You’d probably call some friends and come up with a plan to divide and conquer.

That’s what a human cell does when faced with the task of replicating six billion letters of DNA each time it divides. Instead of reading each chromosome in one slow pass, DNA replication machinery dives in at many origin points. Some segments get copied earlier or later than others.

A new study from geneticists at Harvard Medical School and the Broad Institute of Harvard and MIT has found that this replication plan—including where the origin points are and in what order DNA segments get copied—varies from person to person.

The study, published online Nov. 13 in Cell, also identifies the first genetic variants that orchestrate replication timing.

“Everyone’s cells have a plan for copying the genome. The idea that we don’t all have the same plan is surprising and interesting,” said Steven McCarroll, assistant professor of genetics at HMS, director of genetics for the Stanley Center for Psychiatric Research at the Broad and senior author of the paper.

“It’s a new form of variation in people no one had expected,” said first author Amnon Koren, postdoctoral fellow at HMS and the Broad. “That’s very exciting.”

Hidden orchestrator

DNA replication is one of the most fundamental cellular processes, and any variation among people is likely to affect genetic inheritance, including individual disease risk as well as human evolution, the authors said.

It’s been known that replication timing affects mutation rates; DNA segments that are copied late or too early tend to have more errors. The new study indicates that people with different timing programs therefore have different patterns of mutation risk across their genomes.

For example, McCarroll’s team found that differences in replication timing could explain why some people are more prone than others to certain blood cancers.

Researchers had previously known that acquired mutations in the gene Janus kinase 2, or JAK2, lead to these cancers. They had also noticed that people with such JAK2 mutations tend to have a distinctive set of inherited genetic variants nearby, but they weren’t sure how the inherited variants and the new mutations were connected. McCarroll’s team found that the inherited variants are associated with an “unusually early” replication origin point and proposed that JAK2 is more likely to develop mutations in people with that very early origin point.

“Replication timing may be a way that inherited variation contributes to the risk of later mutations and diseases that we usually think of as arising by chance,” said McCarroll.

Untapped riches

McCarroll, Koren and colleagues were able to make these discoveries in large part because they invented a new way to obtain DNA replication timing data. Turned out, it was hiding in plain sight.

Until now, to study replication timing, scientists needed to painstakingly “grow cells for a couple of weeks and sort them with a special machine and do a big, complicated, expensive, time-consuming experiment”—all to obtain material from just a few people at a time, said Koren.

The team suspected there was an easier way. They turned to the 1000 Genomes Project, which maintains an online database of sequencing data collected from hundreds of people around the world.

Because much of the DNA in the 1000 Genomes Project had been extracted from actively dividing cells, the team hypothesized that information about replication timing lurked within.

They were right. They counted the number of copies of individual genes in each genome. Because early replication origins had created more segment copies at the time the sample was taken than late replication origins had, they were able to create a personalized replication timing map for each person.

“People had seen these patterns before, but just dismissed them as artifacts of sequencing technology,” said McCarroll. After conducting numerous tests to rule out that possibility, “we found that they reflect real biology.”

The researchers then compared each person’s copy number information with his or her genetic sequence data to see if they could match specific genetic variants to replication timing differences. From 161 samples, they identified 16 variants. The variants were short, and most were common.

“I think this is the first time we can pinpoint genetic influences on replication timing in any organism,” said Koren.

The variants were located near replication origin points, leading the team to wonder if they affect replication timing by altering where a person’s origin points are. They also suspect that the variants work by altering chromatin structure, exposing local sequences to replication machinery. The team intends to find out. They also want to search for additional variants that control replication timing.

“These 16 variants are almost certainly just the tip of the iceberg,” said Koren.

The door is open

As more variants come to light in future studies, researchers will be better able to manipulate replication timing in the lab and learn more about how it works and what its biological significance is.

Such studies should flourish now that the team has shown that “all you need to do to study replication timing is grow cells and sequence their DNA, which everyone is doing these days,” said Koren. The new method “is much easier, faster and cheaper, and I think it will transform the field because we can now do experiments in large scale.”

“We found that there is biological information in genome sequence data,” added McCarroll. “But this was still an accidental biological experiment. Now imagine the results when we and others actually design experiments to study this phenomenon.”

This research was funded by the National Human Genome Research Institute (R01 HG 006855), the Integra-Life Seventh Framework Programme (grant #315997), the Stanley Center for Psychiatric Research, the Howard Hughes Medical Institute and the Harvard Stem Cell Institute.