The tools for capturing the fine detail of the human genome and interpreting it for a better picture of human biology and disease are undergoing a transformation more striking than the switch to high-def TV. Greater clarity may not only enrich the understanding of molecular interactions instrumental to health and disease—thereby revealing potential new therapies—but also change the terms of an ongoing debate in human genetics.

The issue is whether common diseases are caused by an accumulation of many common genetic variants that individually have little impact on health or by different, relatively rare variants, each having a significant disease-causing effect. Put another way, is the picture of disease better represented by a composite of numerous pixels in a TV screen or by a collection of different paintings that each conveys a complete image? Until now, the debate—“common disease, common variant” versus “common disease, rare variant”—has been defined largely by the differing techniques for identifying genes linked to disease. But recent advances in technology, the 1000 Genomes Project and HapMap 3, offer a common ground from which all researchers can pursue their investigations no matter which genetic model they favor.

The 1000 Genomes Project is a technology development project,” said David Altshuler, project co-chair and HMS professor of genetics and of medicine at Massachusetts General Hospital, “and it’s building a public database that can be used under any model of genetics to create a reference framework, where every genetic variant that you’re likely to encounter in the population is registered.” An international effort, the project includes many researchers from HMS, Mass General, the Broad Institute and Brigham and Women’s Hospital. Initial results of the pilot phase were published in the Oct. 28 Nature.

Detailing Diversity

The 1000 Genomes Project is rooted in the International HapMap, which was produced in 2005, five years after the first draft of the human genome was published. The goal was to generate a map showing all of the human genome’s haplotypes, blocks of DNA that tend to be inherited together. The HapMap provided an atlas of some of the most common DNA variations, called single nucleotide polymorphisms, or SNPs (pronounced “snips”), having a frequency in the population of 5 percent or more.

By 2007, researchers were using this catalog of SNPs in genomewide association studies (GWAS) in which genetic variants in different individuals are correlated with distinct traits, including the presence or absence of disease. In the last three years, GWAS have uncovered more than 1,000 genetic variants that are associated with disease whereas only about 20 were known beforehand. GWAS have been the driver of the “common disease, common variant” hypothesis on the genetic contribution to disease.

Shortly before the pilot studies of the 1000 Genomes Project were reported, the third version of the HapMap was announced, which extends the atlas of human genetic variation. HapMap 3 documents SNPs with a much lower frequency among people, down to 1 percent, and it captures small insertions and deletions of DNA (“indels”) and structural inconsistencies such as copy number variations (CNVs) of certain genetic sequences.

Based on HapMap 3, the 1000 Genomes Project sharpens the picture of human genetic variation even further. The central difficulty of genome investigation is extracting broadly meaningful information. The human genome has three billion DNA base pairs and a measure of variation from person to person among the nearly seven billion people in the world. Sequencing strategies have to be viable in terms of cost and time, and methods have to be devised to make solid sense of the oceans of data.

The three recently reported pilot studies explored different strategies for detailing genetic diversity. One involved a thorough sequencing of the genomes of six people, covering each genome an average of 20 to 60 times. A second project entailed lower coverage of 179 people’s genomes. A third project sequenced just the protein-coding regions of 1,000 genes in 700 people; protein-coding regions make up about 2 percent of the human genome. The 1000 Genomes Project will ultimately collect data from 2,500 people representing multiple populations around the world. Some of what the project generates will be computationally inferred from the experimental findings.

High-tech Power Boost

The magnitude of the sequencing effort in the 1000 Genomes Project and related programs growing out of the Human Genome Project, along with the analysis needed to turn data into useful information, has introduced an age of interdisciplinary research between biomedical researchers and computer scientists. In addition to Altshuler, key players from HMS in the 1000 Genomes Project are Mark Daly, HMS associate professor of medicine at Mass General, who is central to the data analysis team. Steven McCarroll, HMS assistant professor of genetics, and Charles Lee, HMS associate professor of pathology at Brigham and Women’s Hospital, led the structural variants group, composed of researchers from about a dozen labs around the world. A computational method developed by McCarroll and partners at the Broad Institute contributed a large portion of all the data for the project.

Prior to the current revolution in whole-genome sequencing, the standard gene search method was family-based linkage analysis. This approach targets a gene present in family members affected by a disease that is absent from members who are unaffected. The gene therefore is likely to have a causal effect. Under the “common disease, rare variant” hypothesis, families affected by a disease might carry different rare causal genes. This kind of search for disease genes follows the rules of inheritance laid down by the 19th century monk and scientist Gregor Mendel.

Until now, whole-genome studies probably would have missed these disease genes since they are too rare to have been caught by the net of available technology. The 1000 Genomes Project improves that picture, developing a net that is fine enough and broad enough to capture significantly less common genetic variations that may contribute to disease.

“This is the decade to go answer these 100-year-old questions,” Altshuler said. “There are people who look at this problem and say there aren’t discrete characters that determine you do or don’t have disease, there’s a continuum. There are probably many genes and complex biological processes that affect the development of disease. And there are other people who say, no, I think it’s probably going to be Mendelian. There’s going to be a gene, and when we find it, it will explain a lot.”

Altshuler believes fundamentally that both genetic models of disease have validity. “But in the end,” he said, “we have to do the experiments to find out.”

For more information, students may contact David Altshuler at davidaltshuler@me.com.

Conflict Disclosure: The authors declare no conflicts of interest.

Funding Sources: The Wellcome Trust Sanger Institute in Hinxton, England; Beijing Genomics Institute, Shenzhen (BGI Shenzhen), China; the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH)

Disclaimer: The researchers are unable to provide treatment recommendations for individual cases.