In an empirical science like biology, in which experimentation and observed results rule the day, should data speak for itself? Well, not necessarily, said Peter Sorger. It depends on your philosophy.
In the 4th century BC, Plato described data interpretation best—and first—with his Allegory of the Cave. Prisoners chained to a wall inside a cave see only shadows dancing before them, projections of unknowable creatures cast by an unseen light.
Similarly, the experimental data Sorger uses to map protein signaling networks more closely resemble shadowy projections than true molecular interactions. When researchers evaluate chemical changes like phosphorylation or a cell’s fluorescent glow, which are indirect measures of protein interactions, “we are looking at shadows,” said HMS research fellow Julio Saez-Rodriguez.
The key question is whether mathematical models should be used to help scientists go beyond experimental insight to better understand the shadows. It is this question that is at the heart of Sorger and his team’s participation in a recent scientific challenge. Called DREAM, the challenge invites scientists from around the world to take donated experimental data representing a protein network and formulate a model of what actually is going on. Challengers use their interpretation to revise the initial network and make predictions about the network’s behavior under different chemical stimuli. The goal is to find the best computational approach.
Organized in part by computational biologist Gustavo Stolovitzky of IBM’s Watson Research Center, the DREAM challenge, which facilitates several concurrent challenges at once, has been around for four years. Sorger, HMS professor of systems biology, has participated in the last two challenges.
Cancerous BehaviorThis year, Sorger’s donated data included the results of high-throughput biochemical experiments collected by former HMS research fellow Leonidas Alexopoulos. Sorger’s lab stimulated cells with signaling molecules, such as growth and death factors, and measured the resulting protein products. The cells included both healthy and cancerous liver cells. Sorger’s ultimate goal, he said, is to determine how these two versions of the same cell type differ, in the hopes that these differences might point to potential new therapies.
When Sorger met Stolovitzky three years ago at an early DREAM conference, the two realized they had common long-term goals. “But our interests were separated by the people we knew,” said Sorger. Stolovitzky, for example, had no experience with the kind of physiological, protein-based measurements Sorger and his team were making.
“There’s a large community of computer scientists and applied mathematicians who are interested in these biological problems, but they don’t have access to the data,” said Sorger. “DREAM is a way to bring the experimental and analytical communities together.”
Stolovitzky envisions DREAM evolving from the current, annual challenge, which he calls a set of “community experiments,” into ongoing communal research facilitated by openly available data and model repositories and a discussion forum.
In this year’s competition, 12 teams from labs spanning the globe submitted their analyses to Sorger’s challenge. The two highest-scoring teams approached the problem very differently. One of them, from Italy, opted for a Boolean model, boiling down stimulus–response interactions in a signaling network to ones and zeros. The other, from South Carolina, chose a complex model using ordinary differential equations to predict network interactions.
In follow-on work, a group from MIT is collaborating with Sorger on a fuzzy-logic model, which blurs the Sorger team’s own Boolean model by adding the concept of strong and weak stimulus– response connections.
It Takes a VillageDREAM co-organizer Robert Prill, also from IBM research, scores each model based on how accurately it predicts measurements excluded from the challenge data. This rigorous evaluation, he said, suggests “writing a paper and getting it published is not necessarily the best way to progress.”
The act of formalizing the interpretation of experimental data—by putting intuitive understanding into mathematical terms—reveals both the power and the inadequacy of any given model. In Plato’s cave, for instance, one model may show a head with one pointed ear, the other with two, neither of which reveals the true cat casting the shadow.
“Depending on the question and the quality of data, some modeling approaches are better than others,” said Saez-Rodriguez. “But a community of approaches is best.”
The DREAM Team
Systems biology professor Peter Sorger and his lab publicized experimental data as part of a scientific challenge called DREAM, which invites researchers worldwide to develop a model that explains the submitted data. They also performed their own, independent analysis and devised their own model. Similar to the winning submission from an Italian team, they based their interpretation of the data on a Boolean model of network stimulus–response interactions. The resulting model was published online Dec. 1 in Molecular Systems Biology.
Though the models had common features, their approaches differed. “Our approach was driven by the starting network,” said co–first author and research fellow Julio Saez-Rodriguez. “Theirs by the data alone.”
The Sorger team’s model consists of a refined network-signaling diagram that includes new pathways and excludes others. The team expected such pruning and extending because the starting network was generic, created from data found in hundreds of thousands of papers based on research using a variety of techniques, cell types, and organisms.
The model also includes an “executable,” said Saez-Rodriguez, a computer program “trained” on the experimental data. The same way a neural network learns how to behave based on inputted data, the team’s model learns how to predict the responses to different stimuli by incorporating experimental stimulus–response data.
This modeling exercise is helping Sorger’s lab learn more about liver cancer. In unpublished work, by comparing the different models for each cell line, they found three significant protein signaling pathway differences between liver cancer cells and healthy cells.
One pathway is only active in cancer, said Saez-Rodriguez, “which makes sense because it is connected to growth.” Another, connected to protection of a cell, is only active in healthy cells. The third difference is one of wiring; in healthy cells, two stimuli must be present to activate the pathway, as in a logical AND gate, while in tumor cells, only one is necessary for activation.
These differences represent hypotheses for further investigation. “We have yet to confirm that this is going to have any impact in the clinic,” said Sorger, the senior author. “But the hope is that, reasonably near-term, it will influence how we think about therapy for patients.”
Students may contact Peter Sorger at peter_sorger@hms.harvard.edu for more information.
Conflict Disclosure: The authors declare no conflicts of interest
Funding Sources: The National Institutes of Health; the authors are solely responsible for the content of this work.