The Final Word on STAP

Researchers fail to replicate STAP study; computational analysis reveals genomic inconsistency

George Q. Daley speaks
Failure to replicate controversial study underscores need for greater transparency. Above, Professor George Q. Daley. 

Video: Rick Groleau

Tremendous controversy erupted in early 2014 when two papers published in Nature described how a technique called “stimulus-triggered acquisition of pluripotency,” or STAP, could quickly and efficiently turn ordinary cells into pluripotent stem cells, that is, stem cells capable of developing into all the tissues in the body.

The simplicity of the approach—subjecting the cells to particular stresses like mild acid exposure—seemed too good to be true. And it was.

Almost immediately stem cell researchers around the world began questioning the results, as repeated attempts to replicate the findings failed. After an investigation by the journal revealed many problems and inconsistencies with the data, the papers were retracted.

Despite the retractions, claims persisted that the essential science of STAP was valid and that issues of replication could be solved through refined protocols. As a result, a group of scientists representing seven international laboratories and led by researchers at Harvard Medical School and Boston Children’s Hospital pooled their collective efforts to replicate STAP, which included experiments conducted in the lab where STAP was first developed.

They also went beyond the original experiments and analyzed publicly available genomic sequence data with newly developed bioinformatics algorithms.

Collectively, researchers worldwide were unable to replicate the findings reported in the original STAP papers. These negative results will be published in Nature, along with a companion paper that describes universal hallmarks of pluripotency, providing a roadmap that researchers can use to determine whether they have in fact created induced pluripotent stem cells, or iPS cells.

“The scientific process requires replicating and extending existing data,” said George Q. Daley, HMS professor of biological chemistry and molecular pharmacology at Boston Children’s and co-senior author on both papers addressing the STAP controversy. “We appreciate that can be difficult. We must strive for ever-higher standards of rigor up front, which can be at odds with the rush to publish in this increasingly competitive environment.”

"We must strive for ever-higher standards of rigor up front, which can be at odds with the rush to publish in this increasingly competitive environment." — George Daley

One experiment that the researchers sought to replicate involved a gene called Oct4, one of the most consistent markers of iPS cells. Most scientists agree that Oct4 is essential. To test for Oct4, researchers use a green fluorescent protein that activates when Oct4 is present. In the original STAP studies, the researchers did in fact detect green fluorescence in the cells, leading them to believe that they had induced pluripotency.

However, when Alejandro De Los Angeles, a scientist in the Daley lab, repeated the protocol, he noticed what researchers call “autofluorescence,” a tendency for some molecules in cells to emit light randomly when excited by lasers. The lasers used to detect green fluorescence require proper filters to separate random signal from noise. After exposing cells to the original acid treatment and adjusting for the appropriate laser filters, the researchers detected no active presence ofOct4.

Another hallmark of pluripotent stem cells is their ability to form teratomas, benign tumors that arise when stem cells differentiate into multiple tissues when injected into mice. While the original STAP papers claim to have found teratomas, researchers attempting to replicate teratomas from STAP preparations discovered adverse chemical reactions that could have been mistaken for teratoma formation. Aside from this, no teratomas were found.

In analyzing the original experiments, Peter Park, HMS associate professor of biomedical informatics, developed a set of algorithmic tools to analyze the original genomic data from the study. He refers to this approach as “forensic bioinformatics.”

At first this was challenging because publicly available data sets from the original study were incomplete and poorly labeled. But once Park’s team members had gathered enough data, they were able to determine in less than a month that the initial studies were problematic.

Inferring genetic variants in the DNA of the cells from gene expression data, Francesco Ferrari, a postdoctoral fellow in the Park lab, and his colleagues found that many of the cells described as STAP cells were genomically distinct from their predecessors. In some cases, they were even different genders. In one critical experiment where STAP-derived cells were reported to behave like both embryonic and placental stem cells, it was found that the cell populations were in fact a mixture of embryonic and placental stem cells that pre-existed in the lab.

“At the very least, journals should enforce proper annotation and timely deposition of datasets into public databases,” said Park. “It won’t prevent this sort of thing from ever happening again, but it is an easily attainable safeguard.”

Furthermore, Park emphasized the importance of careful bioinformatic analysis in these studies, noting that “if the authors, their colleagues or the referees of the manuscripts had the right expertise in genomic data analysis, the STAP cell idea could have been discredited much earlier with the data they had already generated. That would have saved so much time and effort for researchers around the world who tried to replicate the findings.”

“Ultimately, we need to have more checks and balances in science,” said Daley, who is also an investigator of the Howard Hughes Medical Institute. “Incentives in the system are so stacked toward being productive and publishing and getting grants that it can lead even very well-intentioned people into too easily accepting their own cognitive biases.”