Nicholas Fabiano, MD's Twitter Thread

AI outperformed doctors on reasoning tasks. Doctor = 30% correct diagnosis AI = 80% correct diagnosis 🧵1/8

These findings are from a study in @arxiv which sought to evaluate OpenAI's o1-preview model, a model developed to increase run-time via chain of thought processes prior to generating a response. https://arxiv.org/abs/2412.108... 2/8

Performance of large language models (LLMs) on medical tasks has traditionally been evaluated using multiple choice question benchmarks; however, such benchmarks are highly constrained, and have an unclear relationship to performance in real clinical scenarios. 3/8

Clinical reasoning, the process by which physicians employ critical thinking to gather and synthesize clinical data to diagnose and manage medical problems, remains an attractive benchmark for model performance. 4/8

The performance of o1-preview was characterized with five experiments including differential diagnosis, diagnostic reasoning, triage differential diagnosis, probabilistic reasoning, and management reasoning, adjudicated by physician experts with validated psychometrics. 5/8

Significant improvements were observed with differential diagnosis generation and quality of diagnostic and management reasoning. 6/8

However, no improvements were observed with probabilistic reasoning or triage differential diagnosis. 7/8

Overall, this study highlights o1-preview's ability to perform strongly on tasks that require complex critical thinking such as diagnosis and management while its performance on probabilistic reasoning tasks was similar to past models. 8/8

Read more about a scientist who treated her own cancer: https://x.com/NTFabiano/status...

Share this thread

Read on Twitter

Navigate thread