As the benefits of AI in healthcare continue to emerge, we investigate GPT-4's potential in radiology. Learn about research exploring GPT-4’s potential in assisting report structuring, classifying diseases, and generating comprehensive findings summaries:
www.microsoft.com
GPT-4’s potential in shaping the future of radiology
Published November 27, 2023
By
Javier Alvarez-Valle , Senior Director of Biomedical Imaging
Matthew Lungren , Chief Medical Information Officer, Nuance Communications
Share this page
This research paper is being presented at the 2023 Conference on Empirical Methods in Natural Language Processing(opens in new tab) (EMNLP 2023), the premier conference on natural language processing and artificial intelligence.
In recent years, AI has been increasingly integrated into healthcare, bringing about new areas of focus and priority, such as diagnostics, treatment planning, patient engagement. While AI’s contribution in certain fields like image analysis and drug interaction is widely recognized, its potential in natural language tasks with these newer areas presents an intriguing research opportunity.
One notable advancement in this area involves
GPT-4’s impressive performance(opens in new tab) on medical competency exams and benchmark datasets. GPT-4 has also demonstrated
potential utility(opens in new tab) in medical consultations, providing a promising outlook for healthcare innovation.
Progressing radiology AI for real problems
Our paper, “
Exploring the Boundaries of GPT-4 in Radiology(opens in new tab),” which we are presenting at
EMNLP 2023(opens in new tab), further explores GPT-4’s potential in healthcare, focusing on its abilities and limitations in radiology—a field that is crucial in disease diagnosis and treatment through imaging technologies like x-rays, computed tomography (CT) and magnetic resonance imaging (MRI). We collaborated with our colleagues at
Nuance(opens in new tab), a Microsoft company, whose solution, PowerScribe, is used by more than 80 percent of US radiologists. Together, we aimed to better understand technology’s impact on radiologists’ workflow.
Our research included a comprehensive evaluation and error analysis framework to rigorously assess GPT-4’s ability to process radiology reports, including common language understanding and generation tasks in radiology, such as disease classification and findings summarization. This framework was developed in collaboration with a board-certified radiologist to tackle more intricate and challenging real-world scenarios in radiology and move beyond mere metric scores.
We also explored various effective zero-, few-shot, and chain-of-thought (CoT) prompting techniques for GPT-4 across different radiology tasks and experimented with approaches to improve the reliability of GPT-4 outputs. For each task, GPT-4 performance was benchmarked against prior GPT-3.5 models and respective state-of-the-art radiology models.
We found that GPT-4 demonstrates new state-of-the-art performance in some tasks, achieving about a 10-percent absolute improvement over existing models, as shown in Table 1. Surprisingly, we found radiology report summaries generated by GPT-4 to be comparable and, in some cases, even preferred over those written by experienced radiologists, with one example illustrated in Table 2.
Table 1: Results overview. GPT-4 either outperforms or is on par with previous state-of-the-art (SOTA) multimodal LLMs.
Table 2. Examples where GPT-4 findings summaries are favored over existing manually written ones on the Open-i dataset. In both examples, GPT-4 outputs are more faithful and provide more complete details on the findings.
Another encouraging prospect for GPT-4 is its ability to automatically structure radiology reports, as schematically illustrated in Figure 1. These reports, based on a radiologist’s interpretation of medical images like x-rays and include patients’ clinical history, are often complex and unstructured, making them difficult to interpret.
Research shows that structuring these reports can improve standardization and consistency in disease descriptions, making them easier to interpret by other healthcare providers and more easily searchable for research and quality improvement initiatives. Additionally, using GPT-4 to structure and standardize radiology reports can further support efforts to augment real-world data (RWD) and its use for
real-world evidence (RWE). This can complement more robust and comprehensive clinical trials and, in turn, accelerate the application of research findings into clinical practice.
Figure 1. Radiology report findings are input into GPT-4, which structures the findings into a knowledge graph and performs tasks such as disease classification, disease progression classification, or impression generation.
Beyond radiology, GPT-4’s potential extends to translating medical reports into more
empathetic(opens in new tab) and understandable formats for patients and other health professionals. This innovation could revolutionize patient engagement and education, making it easier for them and their carers to actively participate in their healthcare.
MICROSOFT RESEARCH PODCAST
Abstracts: October 23, 2023
On “Abstracts,” Partner Research Manager Andy Gordon & Senior Researcher Carina Negreanu explore new work introducing co-audit, a term for any tool-assisted experience that helps users of generative AI find and fix mistakes in AI output.
Listen now
Opens in a new tab
A promising path toward advancing radiology and beyond
When used with human oversight, GPT-4 also has the potential to transform radiology by assisting professionals in their day-to-day tasks. As we continue to explore this cutting-edge technology, there is great promise in improving our evaluation results of GPT-4 by investigating how it can be verified more thoroughly and finding ways to improve its accuracy and reliability.
Our research highlights GPT-4’s potential in advancing radiology and other medical specialties, and while our results are encouraging, they require further validation through extensive research and clinical trials. Nonetheless, the emergence of GPT-4 heralds an exciting future for radiology. It will take the entire medical community working alongside other stakeholders in technology and policy to determine the appropriate use of these tools and responsibly realize the opportunity to transform healthcare. We eagerly anticipate its transformative impact towards improving patient care and safety.
Learn more about this work by visiting the
Project MAIRA(opens in new tab) (Multimodal AI for Radiology Applications) page.
Acknowledgements
We’d like to thank our coauthors: Qianchu Liu,
Stephanie Hyland,
Shruthi Bannur,
Kenza Bouzid,
Daniel C. Castro, Maria Teodora Wetscherek, Robert Tinn,
Harshyta Sharma,
Fernando Perez-Garcia,
Anton Schwaighofer, Pranav Rajpurkar, Sameer Tajdin Khanna,
Hoifung Poon,
Naoto Usuyama,
Anja Thieme,
Aditya V. Nori,
Ozan Oktay