Harvard Study: GPT-o1 Beats ER Doctors at Diagnosis

What Happened

Researchers tested OpenAI's o1 reasoning model against ER physicians on historical patient cases. The model out-diagnosed humans by 12-17 points. Authors called it "a profound change in technology that will reshape medicine," while acknowledging the historical-data design requires prospective trials before clinical deployment.

My Take

The headline number is misleading and the underlying signal is real. Retrospective studies always favor models because the model isn't dealing with a screaming patient, an incomplete history, or a missing CT. But the gap is large enough that the policy question shifts from "should AI assist diagnosis" to "is it negligent to not consult one." Hospital systems that integrate AI second-opinions into triage will see malpractice premiums fall within three years. Holdouts will see them rise.

Read Original Source