Scattered through the dark matter of the genome—the vast expanse of DNA that does not code for protein—lie single nucleotides so nearly identical between human beings and other mammals that researchers have concluded they must have been acted on by natural selection to carry out some function.

Though it has been called junk DNA, the nearly 98 percent of the genome that does not code for protein is known to contain hidden gems—bits of genetic material that carry out regulatory and other functions. But these were thought to show up as continuous sequences of conserved DNA rather than as lone nucleotides. Saurabh Asthana, Shamil Sunyaev, and Gregory Kryukov, working with scientists at the University of Washington, took a set of 567 human genes, along with introns and adjacent noncoding regions, from more than 90 people and combed through it for signs of selection. To their surprise, a large number of solitary noncoding nucleotides bore the stamp of selection, meaning that they were essentially unchanged among individuals. In fact, the overall number of conserved single nucleotides was greater than the number of conserved continuous sequences in the noncoding regions of their sample.

“We’ve found that if you limit yourself to the contiguous conserved elements in the noncoding parts of the genome, then you’re really missing out on a lot of the story,” said Asthana, a graduate student in Sunyaev’s lab. “Even based on our most conservative estimate, it appears that the majority of functional positions in the noncoding genome are in parts that we have not looked in.”

Not all of the solitary nucleotides necessarily carry out a function, said the researchers. Some may look similar simply by chance or because their mutation rate is low. “What I can say is that out of the nucleotides that are identical in humans and other mammals, we estimate that not less than 20 percent are functional,” said Sunyaev, HMS assistant professor of medicine and health sciences and technology at Brigham and Women’s Hospital.

Questions Multiply

Figuring out which nucleotides are functional and what they are doing will be a challenge. Indeed, the findings, which appear in the July 24 Proceedings of the National Academy of Sciences, deepen the mystery of the already mysterious dark matter: how could single nucleotides play such an important role?

“The main question that’s puzzling me a lot is, what is the function of these nucleotides and how do we find them?” said Sunyaev.

The Russian-born researcher had spent years ferreting out traces of natural selection in the genome when he was approached by colleagues from the University of Washington. John Stamatoyannopoulos and William Noble had become interested in a group of 567 genes from more than 90 human subjects and wanted to know which of the 78,472 nucleotide variants in the sample, coding and noncoding, had been shaped by natural selection.

Usually selection works by weeding out deleterious mutations, a process called negative or purifying selection, which in turn reduces the number of positions along a chromosome at which nucleotides will differ among individuals. Should a mutation arise at a particular position, it will have a hard time getting established and will occur only rarely in a population. (If selection were not operating, a mutation might increase in frequency due to chance or a high mutation rate.)

Taking these two hallmarks—low diversity of nucleotides and rarely occuring alternatives—Sunyaev and Asthana, working with Kryukov, HMS fellow in medicine, and with the Seattle-based researchers, set out to discover which of the coding and noncoding positions in the sample had been shaped by purifying selection. The first step was to identify conserved nucleotides in both the coding and noncoding regions, which they did by comparing them to conserved bases in four mammalian genomes—chimp, dog, mouse, and rat. Of these, the coding nucleotides, whose evolution tends to be driven by purifying selection, exhibited the lowest diversity and the greatest preponderance of rare variants.

But even among conserved noncoding positions, the researchers found telltale signs of selection—reduced diversity and an excess of rare alleles. “That was kind of cool,” said Asthana. “We wanted to see if we could explain this using the best known theory about where functional sequences should be, namely in the conserved noncoding sequences.” They eliminated the conserved noncoding sequences (CNSs) from their sample, expecting to see the signals disappear. But they persisted and in only slightly diminished form.

“The result was completely opposite of what we thought. We saw every single signature of selection, and we couldn’t kill the effect,” said Sunyaev.

“We pretty much didn’t believe it,” Asthana said. The researchers have confirmed their findings with the HapMap of human single nucleotide polymorphisms (SNPs).

Potential Disease Ties

Based on their preliminary investigation, the scientists estimate that many more noncoding nucleotides are under natural selection than previously thought. Of these, 71.4 percent are outside conserved noncoding sequences, which points back to a central conundrum.

“Nobody really has a good idea what CNSs are,” Asthana said, adding that some may occur in regions devoted to regulatory functions. “But now you have positions in the middle of nowhere that are not part of a conserved block. The question is, what is going on? How can you have functional positions just lying in the middle of nowhere?” One possibility is that they may be part of transcription binding sites, which can be relatively short. “Or they might be histone position sequences, which tell you how to wrap your DNA,” he said.

Exploring these possibilities and also the more basic question of where the functional nucleotides are located will be difficult, but the payoff could be high. “With rare mendelian diseases such as cystic fibrosis, 98 percent of variants are located in coding regions,” Sunyaev said. “But when we start looking at complex disease—heart disease, diabetes, and so forth—it’s very different. Noncoding variants are probably contributing to these complex phenotypes. The question is, where are they and how do they do it?”