Background: This study aims to compare responses generated by DeepSeek, a new large language model, and ChatGPT to erectile dysfunction (ED)–related questions.
Methods: The study was conducted by posing online queries to both ChatGPT-4o (OpenAI, United States) and DeepSeek V3 (Hangzhou DeepSeek Artificial Intelligence, China). The most frequently asked questions about ED were identified using Google Trends. The responses from both artificial intelligence (AI) were evaluated by three board-certified urologists, who rated their accuracy on a scale of 1 to 4. Readability was assessed using the Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), and Gunning Fog Score (GFS).
Results: DeepSeek V3 received a significantly higher total reviewer score compared to ChatGPT-4o (4 (IQR 1) vs. 3 (IQR 1); p = 0.016), and its responses contained more words on average (233 (IQR 113) vs. 139 (IQR 99), p = 0.004). While no significant difference was observed in FRES (-5.7 (IQR 17.5) vs. -10.1 (IQR 17.7), p = 0.140), both FKGL ( 16.5 (IQR 1.9) vs. 17.9 (IQR 2.7), p = 0.034) and GFS (18.8 (IQR 4.9) vs.20.6 (IQR 5.8), p = 0.016) scores were significantly lower for DeepSeek-V3, indicating superior readability.
Conclusion: While both ChatGPT-4o and DeepSeek-V3 generated fluent and readable responses, DeepSeek-V3 consistently provided longer, more comprehensive, and highly readable answers,accompanied by higher expert-rated accuracy to ED-related questions. These findings highlight the potential of newer AI models—driven by rapid competitive advancements—to effectively address patient inquiries in sensitive medical domains like ED.
Supplementary Information: The online version contains supplementary material available at 10.1186/s12894-025-02037-6.
Keywords: Artificial intelligence; ChatGPT; DeepSeek; Erectile dysfunction.