Automating expert-level medical reasoning evaluation of large language models.
Zhou S, Xie W, Li J, Zhan Z, Song M, Yang H, Espinoza C, Welton L, Mai X, Jin Y, Xu Z, Chung YH, Xing Y, Tsai MH, Schaffer E, Shi Y, Liu N, Liu Z, Zhang R.
Zhou S, et al. Among authors: liu n.
NPJ Digit Med. 2025 Dec 6. doi: 10.1038/s41746-025-02208-7. Online ahead of print.
NPJ Digit Med. 2025.
PMID: 41353516
Free article.