Based on the largest resource of its kind, members of the Exome Aggregation Consortium (ExAC) have reported scientific findings from data on the exome sequences—protein-coding portions of the genome—from 60,706 people from diverse ethnic backgrounds.
The ExAC team, whose analysis appears August 18 in Nature, is led by scientists at the Broad Institute of MIT and Harvard, Harvard Medical School and Massachusetts General Hospital.
Containing more than 10 million DNA variants—many very rare and most identified for the first time—the ExAC dataset is a freely available, high-resolution catalog of human genetic variation that has already made a major impact on clinical research and diagnosis of rare genetic diseases.
“The scale and diversity of the ExAC resource is invaluable. It gives us the ability to discover extremely rare variants and offers an unparalleled window into the roots of rare genetic diseases.”—Daniel MacArthur
The newly published ExAC analysis of the data reveals properties of genetic variation that were undetectable in smaller data sets, such as the first direct observation of mutations that arose multiple times independently among the samples—so-called “mutational recurrence.” The work also uncovers a class of genes that harbor less variation than expected, representing likely disease-causing DNA variants that are rare or absent in the population because they are so detrimental to human health.
With immediate utility for clinical applications, the study further shows that the ExAC database improves the ability to evaluate candidate pathogenic variants in rare disease.
Shared data
“The success of ExAC was made possible by the willingness of our colleagues in many large, disease-focused consortia to openly share sequencing data,” said Daniel MacArthur, HMS assistant professor of medicine at Mass General and senior author of the study. He is also co-director of the Program in Medical and Population Genetics at the Broad.
Previous resources contained only a few thousand exomes without much diversity, so they were inadequate for studies of rare disease genes.
“The scale and diversity of the ExAC resource is invaluable,” MacArthur said. “It gives us the ability to discover extremely rare variants and offers an unparalleled window into the roots of rare genetic diseases.”
After collecting tens of thousands of human exomes from around the globe, the consortium produced a catalog of human genetic variation of unprecedented resolution—roughly one variant every eight bases, or letters, of DNA. Many of these variants had never been reported and most are very rare, occurring in fewer than 1 in 10,000 people.
Clinical applications
With a patient’s genome sequence in hand, a clinician can compare any rare mutations found in his or her genome with those in the ExAC database, shedding light on the genes and proteins that may underlie a patient’s disorder. A variant found in a patient’s DNA sequence that is extremely rare in ExAC, especially one that is predicted to disrupt the function of the resulting protein, then becomes a key suspect in causing the rare disease.
“The ExAC resource gives us incredible insight when evaluating a patient’s genome sequence in the clinic.”—Heidi Rehm
Since its release to the scientific community in October 2014, the ExAC resource has had more than 5 million page views online and has allowed clinicians to provide more-accurate genetic diagnoses for thousands of rare disease patients.
“The ExAC resource gives us incredible insight when evaluating a patient’s genome sequence in the clinic,” said Heidi Rehm, HMS associate professor of pathology at Brigham and Women’s Hospital, medical clinical director of the Broad’s Clinical Research Sequencing Platform and chief laboratory director of the Laboratory for Molecular Medicine at Partners HealthCare Personalized Medicine.
In clinical sequencing, many DNA variants are rare or understudied or both, so it is unclear if they have any effect on disease risk and whether they should be taken into consideration when diagnosing and treating patients. By looking at the frequencies of a patient’s variants in the ExAC database, Rehm and her team can rule out those that are relatively common, which allows them to home in more quickly on the true disease-causing variants and avoid costly follow-up on benign ones.
New rare disorders
The resource has also been used by researchers to identify dozens of new rare genetic disorders.
“In our own research, using the ExAC resource has allowed us to apply novel statistical methods to identify several new severe developmental disorders,” said Matthew Hurles, a researcher at the Wellcome Trust Sanger Institute and frequent user of the ExAC database. “Resources such as ExAC exemplify the benefits that can be achieved for families coping with rare genetic diseases, as a result of the mass altruism of many research participants who allow their data to be aggregated and shared.”
The ExAC database is also being used by researchers exploring the more fundamental effects of genetic variation, such as looking at variation in transcription factor proteins and its impact on protein-protein interaction networks.
Variation that was expected, but not found, in the data offered new insight. Some genes were found to have less than the expected number of missense mutations, which change the protein sequence, or loss-of-function mutations, which obliterate protein function. With such a large sample size, the researchers were able to quantify the deficit of these types of mutation per gene and identify a few thousand “highly constrained” genes for which natural selection has weeded out these mutations because their effects are so detrimental.
With no knowledge about the diseases they cause and often no actual instances of these mutations in the ExAC database, the “missing variation” indicates that these highly constrained genes are likely to cause severe disease. If clinical or research sequencing reveals a loss-of-function or missense mutation in one of these genes in a patient’s genome, it becomes a strong candidate for causing his or her rare disease.
“With its large sample size and high resolution across many populations, the ExAC database provides much greater power to interpret rare disease-causing variants than ever before, even for common diseases.”—Jose Florez
The ExAC data also revealed more than 100 previously reported disease-causing mutations to actually be benign, reducing the number of these false-positive findings in databases widely used by clinical labs. This finding demonstrates the value of the ExAC database in assessing claims that specific mutations cause disease.
“With its large sample size and high resolution across many populations, the ExAC database provides much greater power to interpret rare disease-causing variants than ever before, even for common diseases,” said Jose Florez, HMS associate professor and chief of the Diabetes Unit at at Mass General and an institute member at the Broad.
Adapted from a Broad Institute news release.