
Nicholas Fabiano, MD
@NTFabiano
AI outperformed doctors on reasoning tasks. Doctor = 30% correct diagnosis AI = 80% correct diagnosis 🧵1/8
These findings are from a study in @arxiv which sought to evaluate OpenAI's o1-preview model, a model developed to increase run-time via chain of thought processes prior to generating a response. https://arxiv.org/abs/2412.108... 2/8
Performance of large language models (LLMs) on medical tasks has traditionally been evaluated using multiple choice question benchmarks; however, such benchmarks are highly constrained, and have an unclear relationship to performance in real clinical scenarios. 3/8
Clinical reasoning, the process by which physicians employ critical thinking to gather and synthesize clinical data to diagnose and manage medical problems, remains an attractive benchmark for model performance. 4/8
The performance of o1-preview was characterized with five experiments including differential diagnosis, diagnostic reasoning, triage differential diagnosis, probabilistic reasoning, and management reasoning, adjudicated by physician experts with validated psychometrics. 5/8
Significant improvements were observed with differential diagnosis generation and quality of diagnostic and management reasoning. 6/8
However, no improvements were observed with probabilistic reasoning or triage differential diagnosis. 7/8
Overall, this study highlights o1-preview's ability to perform strongly on tasks that require complex critical thinking such as diagnosis and management while its performance on probabilistic reasoning tasks was similar to past models. 8/8
Read more about a scientist who treated her own cancer: https://x.com/NTFabiano/status...