Decoding Cancer's Signature

New algorithm IDs patients harboring a tumor-causing defect found in multiple cancers 

Cancer DNA

iStock/Anusorn Nakdee

Medications known as PARP inhibitors have emerged as a promising therapy for several forms of cancer fueled by a defect in the cells’ DNA repair machinery.

Yet many people with cancers caused by the defect, known as HR deficiency, who stand to benefit from PARP inhibitors, remain unidentified because standard genetic panels used in the clinic do not reliably detect the cancer-causing HR deficiency.

Now, scientists from Harvard Medical School have designed an algorithm that can successfully “read” the molecular signature of the cancer-driving defect and identify patients who could benefit from treatment with PARP inhibitors.

Get more HM news

If incorporated into standard gene panel tests, the researchers said, the algorithm could expand greatly the pool of patients who stand to benefit from PARP-inhibitor therapy but currently are not getting it.

A report on the work is published April 15 in Nature Genetics.

“Pinpointing actionable genetic biomarkers and treating patients with drugs that specifically target the relevant cancer-driving pathways is at the heart of precision medicine. We believe our algorithm can greatly enhance physicians’ ability to deliver such individualized therapy,” said study senior author Peter Park, professor of biomedical informatics in the Blavatnik Institute at HMS.

PARP inhibitors are most commonly used in patients with breast cancer who have mutations in their BRCA genes. BRCA mutations can interfere with the cells’ HR machinery, a mechanism used by cells to mend harmful DNA breaks. Yet, not everyone with an HR deficiency has a BRCA mutation. In fact, many people with breast cancer who harbor HR defects do not have BRCA mutations. As a result, most commonly used genetic tests—designed to look for BRCA mutation—miss the underlying HR deficiency that can give rise to breast, ovarian, pancreatic and other cancers. Thus, a number of cancer cases fueled by HR gene defects remain undetected by standard gene assays, the researchers said.

“We suspect there are many more patients without BRCA mutations who could benefit from PARP inhibitors, but doctors do not know which ones they are. Our approach could help close that gap,” Park said.

Computer to clinic

Given that HR gene defects underlie multiple cancers, researchers say they hope the new algorithm could be swiftly incorporated into genetic tests already used in hospitals.

“Tens of thousands of patients with cancer are profiled with gene panels across many hospitals and we believe our algorithm can detect the molecular footprints of the underlying cancer-causing defects with much greater sensitivity,” said study first author Doga Gulhan, a post-doctoral researcher in the department of biomedical informatics at HMS. “The overarching goal of such testing is to help clinicians determine the optimal treatment for each patient based on the absence or presence of a given gene defect.”

The team’s analysis suggests that in the case of breast cancer alone, incorporating the algorithm into current genetic panels would double the number of patients who could benefit from PARP inhibitors. Of the 270,000 new breast cancer cases diagnosed in 2018, between 5 percent and 10 percent (13,500 to 27,000) are attributed to BRCA defects. Using computer simulation analysis, the researchers identified twice as many cases of breast cancer (27,00 to 54,000) bearing the genomic footprint of HR defects without BRCA mutations.

Patients with breast cancer who have BRCA gene defects—reliable but not exclusive markers of HR deficiency—are already treated successfully with PARP inhibitors. But the researchers point to emerging evidence suggesting many other cancers driven by HR defects could also benefit from treatment with PARP inhibitors. Because the new algorithm detects the molecular signature of the HR defect, the presence of the signature could be used as a predictive biomarker for response to PARP inhibitors in a range of cancers, Park said.

The language of cancer

Each tumor has a language of its own and leaves a written molecular trail of its origins.

Cancer mutations can arise from inherited malformations in gene structure or from environmental causes such as UV radiation or cigarette smoke. Each of these disruptors causes idiosyncratic alterations in the cells’ DNA. As a result, the letters of the DNA strand get scrambled—a genetic spelling error that can give rise to cancer. The new algorithm is capable of identifying characteristic patterns of such spelling errors to detect the presence of the HR defect.

Detecting HR’s telltale molecular clues in genetic samples is currently possible only if a person has their entire genome—some 20,000 genes—sequenced. Such extensive sequencing, however, is not done in the clinic and is limited to research uses. By comparison, most standard genetic panels analyze between 200 and 400 genes.

The advantage of the new algorithm is that it can see the molecular footprints of the HR defect even in the standard clinical tests, which analyze only a subset of genes. The researchers say their algorithm is better at detecting the presence of HR defects because it was “trained” on thousands of fully sequenced tumor genomes. This extensive training gives the algorithm a more expansive vocabulary that allows it to read and interpret many more molecular languages and misspellings based on far fewer molecular clues.

The algorithm’s ability to spot the markers of HR deficiency from only a handful of genes and a few mutations is akin to being able to understand the meaning of a text based on a single chapter (400 or so genes) instead of the entire book (20,000 fully sequenced genes), the researchers explained.

Better at reading comprehension

To test the accuracy of their model, dubbed SigMA (Signature Mutational Analysis), the investigators measured its performance against 730 samples analyzed by whole-genome sequencing, the gold standard for mutation detection. Despite the fact that it was reading far fewer genes and fewer mutations, the SigMA model correctly identified 163 of 221 samples with HR deficiency, a 74 percent accuracy rate. This is a notable improvement over current algorithms that detect HR-deficient cancer cells at a rate of 30 to 40 percent, the team said.

To gauge SigMA’s performance on a real gene panel, the researchers applied it to 878 breast tumor samples from patients who had been previously analyzed by a standard genetic test. The algorithm identified 23 percent of the tumor samples as bearing the mark of HR deficiency. The algorithm also detected previously unidentified HR defects in other types of cancers, ranging from 5 percent in esophageal cancers and 38 percent of samples in ovarian cancers.

To determine whether the algorithm could accurately predict response to PARP inhibitors, the scientists analyzed results from experiments on 383 patient tumor cell lines from 14 cancer types treated with four PARP inhibitors. Breast cancer cell lines identified by the SigMA algorithm as bearing the molecular mark of HR deficiency responded better to the PARP inhibitor olaparib than cells that did not bear the molecular signature of HR deficiency. A similar effect was observed in breast cancer cell lines treated with three other PARP inhibitors. Other tumor types also responded better to PARP inhibitors if they were identified as HR-deficient by the SigMA model. The observation suggests that the algorithm can reliably identify patients who could benefit from PARP inhibitors, the team said.


“We have spoken with many clinicians in the past months and we have started multiple collaborations in which additional patients in clinical trials will be given the drug based on our predictions,” Park said. “We think we could make a real impact in cancer care with this computational method.”

The researchers caution that the SigMA model cannot detect HR deficiencies in certain cancers with very few mutations—such as medulloblastoma, a type of brain cancer, and Ewing sarcoma, a type of bone cancer.

However, as the number of publicly available fully sequenced genomes grows, the algorithm could be trained on more tumor types to detect a greater variety of genetic mutations.

“The accuracy of the algorithm will vary by cancer type,” Park said. “But even when the detection rate is not as high, there still will be additional cases identified that would be otherwise missed. What this ultimately means is better targeted treatments for more people.”

Study co-investigators included Jake June-Koo Lee, Giorgio Melloni and Isidro Cortes-Ciriano.

The work was supported by funding from the Ludwig Center at Harvard and by the European Union through Curie grant 703543.