Open-Source AI Matches Top Proprietary LLM in Solving Tough Medical Cases

Greater competition between AI diagnostic tools should benefit patients and clinicians

A hand in a protective glove, using a futuristic screen to look at a diagram
Image: Ignatiev/Getty Images

At a glance:

  • Open-source AI model performed on par with leading proprietary AI tool in solving tough medical cases that require complex clinical reasoning.

  • AI can optimize clinicians’ performance and help reduce diagnostic errors and delays.

  • Greater competition, more choice in AI diagnostic tools to better serve patients, clinicians, and health care systems.

Artificial intelligence can transform medicine in a myriad of ways, including its promise to act as a trusted diagnostic aide to busy clinicians.

Over the past two years, proprietary AI models, also known as closed-source models, have excelled at solving hard-to-crack medical cases that require complex clinical reasoning. Notably, these closed-source AI models have outperformed open-source ones, so-called because their source code is publicly available and can be tweaked and modified by anyone.

Has open-source AI caught up?

Get more HMS news

The answer appears to be yes, at least when it comes to one such open-source AI model, according to the findings of a new NIH-funded study led by researchers at Harvard Medical School and done in collaboration with clinicians at Harvard-affiliated Beth Israel Deaconess Medical Center and Brigham and Women’s Hospital.

The results, published March 14 in JAMA Health Forum, show that a challenger open-source AI tool called Llama 3.1 405B performed on par with GPT-4, a leading proprietary closed-source model. In their analysis, the researchers compared the performance of the two models on 92 mystifying cases featured in The New England Journal of Medicine weekly rubric of diagnostically challenging clinical scenarios.

The findings suggest that open-source AI tools are becoming increasingly competitive and could offer a valuable alternative to proprietary models.

“To our knowledge, this is the first time an open-source AI model has matched the performance of GPT-4 on such challenging cases as assessed by physicians,” said senior author Arjun Manrai, assistant professor of biomedical informatics in the Blavatnik Institute at HMS. “It really is stunning that the Llama models caught up so quickly with the leading proprietary model. Patients, care providers, and hospitals stand to gain from this competition.”

The pros and cons of open-source and closed-source AI systems

Open-source AI and closed-source AI differ in several important ways. First, open-source models can be downloaded and run on a hospital’s private computers, keeping patient data in-house. In contrast, closed-source models operate on external servers, requiring users to transmit private data externally.

Authorship, funding, disclosures

Additional authors include Byron Crowe and Raja-Elie E. Abdulnour.

This project was supported by award K01HL138259 from the National Heart, Lung, and Blood Institute and a Harvard Medical School Dean’s Innovation Award.

Crowe reported receiving personal fees from Solera Health outside the submitted work. Rodman reported receiving grants from the Gordon and Betty Moore Foundation outside the submitted work.