Panning for Genetic Gold

Machine learning tool pinpoints disease-related genes, functions

White hand holding tweezers reaches for a small piece of gold in a blue dish full of wet sand
Image: FastGlassPhotos/iStock/Getty Images Plus

The idea struck Robert Ietswaart, a research fellow in genetics at Harvard Medical School, while he was trying to determine how an experimental drug slowed the growth of lung cancer cells.

He saw that the drug triggered a cascade of molecular and genetic changes in the cells, but he needed to narrow down which of the many activated genes were actually beating back the cancer rather than doing unrelated jobs. And, given that individual genes often do more than one thing—some even perform more than 100 different tasks—he needed to figure out which jobs the key genes were doing in these cells.

There were so many options that Ietswaart didn’t know where to start.

Get more HMS news here

Researchers in this position normally rely on experience, and sometimes software, to sift through the sludge of candidate genes and identify the gold nuggets that cause or contribute to a disease or amplify the effects of a drug. Then they research how those genes may be operating by poring over archives of scientific literature. This helps them build a better springboard from which to dive into experiments.

Ietswaart, however, who trained in computational biology, had a better idea: create a tool that would search for and identify the most important genes and gene functions automatically. Existing tools could gauge which biological processes were relevant for an experiment but didn't rank individual genes or functions.

“I realized that many researchers struggle with the same questions,” said Ietswaart. “So, I decided to build something that would be useful not only for me but for the broader scientific community.”

The fruits of that labor—a collaboration between the labs of geneticist Stirling Churchman and systems pharmacologist Peter Sorger at HMS—were published Feb. 2 in Genome Biology.

The tool, dubbed GeneWalk, uses a combination of machine learning and automated literature analyses to indicate which genes and functions are most likely relevant to a researcher’s project.

“It’s the conundrum of so many biology labs these days: We have a list of 1,000 genes and we need to figure out what to do next,” said Churchman, associate professor of genetics in the Blavatnik Institute at HMS and senior author of the paper. “We have a tool that helps you figure out not only which genes to follow up on but also what those genes are doing in the system you're studying.”

We’re filling a gap that a lot of people didn’t think was possible to fill.

Stirling Churchman

Machine learning in biomedical research is very much about making each step along the way a little easier.

Robert Ietswaart