The effectiveness of large language models in medical AI research for physicians: A randomized controlled trial

Cell Rep Med. 2025 Dec 16;6(12):102469. doi: 10.1016/j.xcrm.2025.102469. Epub 2025 Nov 26.

Abstract

Physicians offer invaluable clinical insights, but their involvement in medical AI research is hindered by limited technical expertise. We conduct a superiority, open-label, randomized controlled trial involving 64 junior ophthalmologists to undertake a 2-week project on "automated cataract identification" under minimal engineering assistance, with (intervention, n = 32) or without (control, n = 32) ChatGPT-3.5. The overall project completion rate is higher in intervention group than controls (87.5% vs. 25.0%; difference 62.5%, p = 9.42e-7), and the unassisted completion rate likewise (68.7% vs. 3.1%; difference 65.6%, p = 5.70e-8). The intervention group demonstrates better project planning and faster completion times (p < 0.01). After a 2-week washout, 41.2% of successful intervention participants complete a new project without the support of large language models (LLMs). A survey shows that 42.6% of participants fear regurgitating information without understanding and 40.4% worry about fostering lazy thinking, indicating potential dependency. Therefore, LLMs can help physicians overcome technical barriers, although long-term risks require further study. Trial registration: This study was registered at ClinicalTrials.gov (NCT06015178).

Keywords: AI-augmented medical research; LLMs; large language models; medical education; randomized controlled trial.

Publication types

  • Randomized Controlled Trial

MeSH terms

  • Adult
  • Artificial Intelligence*
  • Female
  • Humans
  • Language*
  • Large Language Models
  • Male
  • Physicians*

Associated data

  • ClinicalTrials.gov/NCT06015178