Predictive Genomics

Two new technologies define the consequences of genetic variation on a proteome-wide scale

Image: NHGRI

Image: NHGRI

Combining two emerging, large-scale technologies for the first time—multiplexed mass spectrometry and a mouse population with a high level of natural genetic diversity—researchers at Harvard Medical School and The Jackson Laboratory can now crack an outstanding question in biology and medicine: How do genetic variants affect protein levels?

Proteins are chains of amino acids that comprise the structural and functional “parts list” of all cells and organisms. Understanding the regulation of protein expression is therefore critical to understanding normal development and disease.

“We can now uncover relationships among genes, transcripts and proteins not previously known.”—Steven Gygi 

The central dogma of molecular biology describes this transfer of genetic information from DNA to RNA to protein. The DNA sequence is first transcribed into messenger RNA, or mRNA, and then the cell’s protein-building machinery translates the mRNA sequences into the amino-acid sequence of the protein.

Get more HMS news here.

Given this direct relationship between RNA and proteins, it was widely assumed that protein expression would track closely with mRNA expression. Yet several studies comparing cellular mRNA levels and protein levels have shown a surprisingly high level of discordance between the two, suggesting that one or more mechanisms act to buffer protein levels from genetic variants that affect mRNA levels.

Previous experiments in mice and human cell lines aimed at identifying these mechanisms have been inconclusive.

To address this puzzle, Gary Churchill,  Jackson Lab professor and Karl Gunnar Johansson Chair and a pioneer in developing the Collaborative Cross and Diversity Outbred mouse populations, joined forces with Steven Gygi, HMS professor of cell biology, a leader in the rapidly advancing field of quantitative proteomics, which is the study of an organism’s entire complement of proteins.

"This makes an entirely new scale of analysis possible.”—Gary Churchill

Diversity Outbred mice, bred from eight founder strains, contain extensive genetic variation.

Our mouse populations have more than 50 million SNPs,” or single nucleotide polymorphisms, which are variations in individual DNA building blocks, said Churchill.  “Steve can measure the levels of thousands of proteins instead of dozens. This makes an entirely new scale of analysis possible.”

Gygi and Churchill are co-senior authors of a paper in Nature in which they compared mRNA and protein levels in the livers of 192 Diversity Outbred mice.

The researchers identified 2,866 genetic markers that correlate with differences in protein levels across mice (protein quantitative trait loci, or pQTL) and observed two striking patterns. Most proteins with “local” pQTL—where the genetic variant influencing protein abundance is located close to the DNA sequence that encodes that protein—showed strong evidence of transcriptional regulation where protein levels tracked closely with mRNA levels.

In stark contrast, proteins with “distant” pQTL—where the genetic variant influencing protein abundance is located far away from the DNA sequence that encodes the protein—appeared completely uncoupled from their corresponding mRNA’s abundance. By applying a novel statistical approach, they showed that the post-transcriptional effects of many distant pQTL could be attributed to a second protein, revealing an extensive network of direct protein–protein interactions and tightly regulated cellular pathways.

The researchers confirmed their findings in Collaborative Cross mice.

“We can now uncover relationships among genes, transcripts and proteins not previously known,” Gygi said. “Our findings suggest a new predictive genomics framework, combining quantitative proteomics and transcriptomics to infer the proteome-wide effects of a specific genetic variant.”

Within this framework, Gygi said, researchers can explore and fine-tune pathways associated with the physical process, disease or characteristic of interest.

The study was supported by Harvard Medical School, The Jackson Laboratory and the National Institutes of Health (grants P50GM076468, F32HD074299, GM67945 and U41HG006673).

Adapted from a Jackson Lab news release.