Researchers have begun to appreciate the importance of copy number variation when considering the connections between DNA and disease.
Most people have two copies of most genes. But some have only one copy, or three, or none. There have been hints that copy number variation (CNV) might range much more widely than zero to three, but such extremes have been hard to analyze in gene sequencing data.
“For all the excitement about copy number variation in human genetics, most earlier research has been limited to the simplest form of CNV, in which you have either a missing segment or an extra copy of it,” said Steven McCarroll, assistant professor of genetics at Harvard Medical School and director of genetics for the Stanley Center for Psychiatric Research at the Broad Institute of MIT and Harvard.
“Here we came up with a way to analyze extreme forms of CNV,” he said. “Now we can start to use this exuberant form of genetic variation to help illuminate the genetic basis of disease.”
McCarroll and colleagues reported their insights about extreme CNV in Nature Genetics on Jan. 26. Their discoveries were made possible by new computational techniques that first author Bob Handsaker developed to analyze whole-genome sequence data from thousands of genomes at once.
“Before, we had no good way to study genes that have a really high copy number, above four,” said Handsaker, a research scientist in the McCarroll lab. “Now we can find places where people’s gene copy number ranges from zero to 15. It’s the first time we’ve been able to measure this kind of variation with such precision.”
“We’ve found that in hundreds of genes, there’s a wide variation in copy numbers. Now that we can measure these variations accurately, we can ask whether there are health repercussions,” said Handsaker.
The results also enrich the understanding of human genome evolution, said McCarroll.
Once they had developed a way to study extreme CNV, Handsaker, McCarroll and their team made four primary discoveries.
First: About 88 percent of gene copy number variation among humans arises from extreme copy number variants rather than simple copy number variants.
“These extreme copy number variants are a small fraction of all CNVs, but they have broader effects on genes than we anticipated,” said McCarroll.
Second: The more copies of a gene a person has, the more that gene is expressed.
“You might think this was obvious,” said Handsaker, “but in some organisms, such as plants, when you have more copies, most of them are turned off. It turns out that in humans, they’re all turned on in almost all cases.”
Third: With simple CNV, most people have two copies, while a few outliers have one or three or none. McCarroll’s team found that with extreme CNV, most people don’t have two copies but instead have CNVs scattered across a wide range.
“For a lot of these CNVs with these especially exuberant differences, two randomly chosen people are actually more likely to have different numbers of copies than the same number,” said Handsaker.
Fourth: Sequences with more copies are more likely to mutate further, expanding in copy number quickly and dramatically.
The team found what they call “runaway duplication haplotypes,” in which some versions of a chromosome have acquired as many as 10 copies of a gene over the past thousand or so generations, while other versions of the same chromosome continue to have just one copy.
“The fast, dramatic expansion in copy number of specific genes appears to have been evolutionarily recent and geographically localized,” said McCarroll.
One gene involved in resistance to trypanosomes—parasites that cause human illnesses including sleeping sickness and Chagas disease—evolved to have a high copy number on a subset of the chromosomes in West African populations. Another gene, related to a gene that contributes to asthma resistance, evolved to have a high copy number in Europe.
“These variations show really unusual patterns in some parts of the world,” said McCarroll. “But it’s too soon to know whether they’re doing something important.”
The team is now offering to the research community “the first data resource on extreme forms of CNV and how they actually vary across a large number of people” as well as a software toolkit to analyze extreme CNV in huge sequencing data sets, McCarroll said.
“Until recently, whole-genome sequencing was quite expensive. Today, that’s changing quickly,” McCarroll added. “This work gives us a sense of the kinds of things it’s going to be possible to see in whole-genome sequences that it wasn’t possible to see before.”
Coauthor Jennifer R. Berman is an employee of Bio-Rad Inc.
This research was supported by National Human Genome Research Institute grant R01 HG006855. Additional funding from NHGRI (U01 HG006510) is supporting follow-on work to develop production-ready software that can be used by any research laboratory.