Sobering Reports on AI for CPR, Cancer Treatment Advice

Two new studies suggest AI voice assistants, ChatGPT not ready for prime time

photo of a user looking at a smartphone screen in front of a computer
Image: insjoy/iStock/Getty Images Plus

At a glance:

  • One study found serious limitations in AI-based voice assistants to guide CPR.
  • Experts recommend that to receive optimal CPR guidance, bystanders call 911 rather than use AI-based voice assistants.
  • Another study found that ChatGPT did not reliably give evidence-based recommendations for cancer treatments.
  • Researchers caution that patients should always seek guidance from their clinicians for the most accurate and individualized advice.

Artificial intelligence holds the promise to reshape clinical medicine and empower patients, but when it comes to cardiopulmonary resuscitation and cancer treatments, certain AI tools are not quite there yet, according to two separate studies led by Harvard Medical School researchers at Brigham and Women’s Hospital.

Get more HMS news here

AI voice assistants deliver subpar results on CPR directions

One study, published Aug. 28 in JAMA Network Open, found CPR directions provided by AI-based voice assistants often lacked relevance and came with inconsistencies.

Researchers posed eight verbal questions to four voice assistants, including Amazon’s Alexa, Apple’s Siri, Google Assistant on Nest Mini, and Microsoft’s Cortana. They also typed the same queries into ChatGPT. All responses were evaluated by two board-certified emergency medicine physicians.

Nearly half of the responses from the voice assistants were unrelated to CPR, such as providing information related to a movie called CPR or a link to Colorado Public Radio News. Only 28 percent suggested calling emergency services. Only 34 percent of responses provided CPR instruction and 12 percent provided verbal instructions. ChatGPT provided the most relevant information for all queries of the platforms tested. Based on the findings, the authors concluded that use of existing AI voice assistants may delay care and may not provide appropriate information.

Administered out of the hospital by untrained bystanders, CPR increases the chance for surviving a cardiac arrest by two to four times. Bystanders can obtain CPR instructions from emergency dispatchers, but these services are not universally available and may not always be used. In emergencies, AI voice assistants may offer easy access to lifesaving CPR instructions.

So, what is the bottom line for civilians who may end up in a situation to provide first aid?

“Our findings suggest that bystanders should call emergency services rather than rely on a voice assistant,” said study senior author Adam Landman, HMS associate professor of emergency medicine at Brigham and Women’s and chief information officer and senior vice president of digital innovation at Mass General Brigham.

“Voice assistants have potential to help provide CPR instructions, but need to have more standardized, evidence-based guidance built into their core functionalities,” added Landman, who is also an attending emergency physician.

The findings should be heeded as a call to action — and as an opportunity — for tech companies to collaborate with one another and standardize their emergency responses in an effort to improve public health, Landman and colleagues urged.

ChatGPT and cancer treatment advice: Room for improvement

In another study, HMS researchers at Brigham and Women’s found that ChatGPT has limited ability to recommend cancer treatments based on national guidelines.

The research, published Aug. 24 in JAMA Oncology, showed that in nearly one-third of cases, ChatGPT 3.5 provided an incorrect recommendation. Correct and incorrect recommendations intermingled in one-third of the chatbot’s responses, making errors more difficult to detect, the authors said.

For many patients, the internet is already a powerful tool for self-education on medical topics. The AI-powered tool ChatGPT is now increasingly used to research medical topics, but the investigators found it did not provide consistent recommendations on cancer treatments aligned with guidelines from the National Comprehensive Cancer Network.

The findings highlight the need for awareness of the technology’s limitations and the importance of working with one’s physician to individualize treatment.

Patients should feel empowered to educate themselves about their medical conditions, but they should always discuss with a clinician, and resources on the internet should not be consulted in isolation,” said study senior author Danielle Bitterman, HMS assistant professor of radiation oncology at Brigham and Women’s and a faculty member of the Artificial Intelligence in Medicine (AIM) Program at Mass General Brigham.

“ChatGPT responses can sound a lot like a human and can be quite convincing, but when it comes to clinical decision-making, there are so many subtleties for every patient’s unique situation,” Bitterman added. “A right answer can be very nuanced, and not necessarily something ChatGPT or another large language model can provide.”

Thus, the authors caution, it is critical to audit the performance of AI tools and ensure that they are aligned with the best evidence and latest guidelines.

Medical decision-making is based on multiple factors, but NCCN guidelines are used widely by physicians and institutions as a foundation for these treatment choices, the authors said. Bitterman and colleagues chose to evaluate the extent to which ChatGPT’s recommendations aligned with these guidelines. They focused on the three most common cancers (breast, prostate, and lung) and prompted ChatGPT to provide a treatment approach for each cancer based on the severity of the disease.

In total, the researchers included 26 unique diagnostic descriptions and used four slightly different prompts to ask ChatGPT to provide a treatment approach, generating a total of 104 prompts.

Ninety-eight percent of the chatbot’s responses included at least one treatment approach that agreed with NCCN guidelines. However, 34 percent of these responses also included one or more non-concordant recommendations, which were sometimes difficult to detect amid otherwise sound guidance.

A non-concordant treatment recommendation was defined as one that included one or more partly correct or incorrect recommendations. For example, for a locally advanced breast cancer, a recommendation of surgery alone, without mention of another therapy modality, would be partly correct. Complete agreement in scoring occurred in only 62 percent of cases, underscoring both the complexity of the NCCN guidelines themselves and how difficult and vague ChatGPT’s output could be to interpret.

In 12.5 percent of cases, ChatGPT produced “hallucinations,” or a treatment recommendation entirely absent from NCCN guidelines. These included recommendations of novel therapies or curative therapies for incurable cancers. The authors emphasized that this form of misinformation could incorrectly set patients’ expectations and could even impact the clinician-patient relationship.

The authors used GPT-3.5-turbo-0301, one of the largest models available at the time they conducted the study and the model class that is currently used in the open-access version of ChatGPT. A newer version, GPT-4, was not available at the time of the study and is currently available only with a paid subscription.

The researchers used the 2021 NCCN guidelines, because GPT-3.5-turbo-0301 was developed using data up to September 2021. While results may vary if other large-language models and/or clinical guidelines are used, the researchers emphasized that many such AI tools are similar in design and limitations.

Authorship, funding, disclosures

Co-authors of Quality of Layperson CPR Instructions from Artificial Intelligence Voice Assistants included William Murk, Eric Goralnick, and John S. Brownstein.

Landman reported receiving personal fees from Abbott during the conduct of the study.

Co-authors of Use of Artificial Intelligence Chatbots for Cancer Treatment Informationincluded Shan Chen, Benjamin H. Kann, Michael B. Foote, Hugo J.W.L. Aerts, Guergana K. Savova, and Raymond H. Mak.

This study was supported by the Woods Foundation.

Aerts reported receiving personal fees from Onc.AI, Sphera, LLC, and Bristol Myers Squibb outside the submitted work. Mak reported receiving personal fees from ViewRay, AstraZeneca, Novartis, Varian Medical Systems, and Sio Capital outside the submitted work. Bitterman is the associate editor of Radiation Oncology, and receives funding from the American Association for Cancer Research outside the submitted work.

Adapted from Brigham and Women’s and Mass General Brigham news releases.