October 10, 2024
Study Warns Against Trusting AI in Critical ER Settings
AI technology isn't ready to take over the emergency room just yet, according to a new study.
ChatGPT, a popular AI tool, could end up recommending unnecessary X-rays, antibiotics, and even hospital admissions for some patients, researchers reported Tuesday in Nature Communications. These findings suggest AI may not be ready for the complexities of ER decision-making.
"This serves as a reminder to clinicians not to place blind trust in these models," said lead researcher Chris Williams, a postdoctoral scholar at the University of California, San Francisco. "ChatGPT can handle tasks like answering medical exam questions or drafting clinical notes, but it struggles with the multiple factors involved in emergency department scenarios," Williams added.
In the study, researchers tested ChatGPT by having it provide recommendations similar to those made by ER doctors after an initial patient examination. The team ran data from 1,000 past ER visits through the AI system, asking it whether each patient should be admitted, prescribed antibiotics, or sent for X-rays.
The results revealed that ChatGPT often recommended more services than necessary. Specifically, the ChatGPT-4 model was 8% less accurate than human doctors, and the older ChatGPT-3.5 was 24% less accurate.
This tendency to overprescribe could stem from the way AI models are trained, Williams explained. Since these systems learn from internet data, they often rely on medical advice sources that encourage users to seek further care, which may not be suitable in an ER setting. "These models are almost programmed to say, 'seek medical advice,' which is good for public safety but not always appropriate for an ER," Williams said.
In emergency rooms, unnecessary treatments can harm patients, strain resources, and increase costs. To be useful in such settings, AI needs a more refined framework that balances catching serious conditions while avoiding needless tests and treatments, Williams added.
"There's no perfect solution," he noted, "but knowing these models' limitations helps us think more critically about how we want them to operate in clinical settings."