Simple Prompting Enhances ChatGPT’s Diagnostic Accuracy in Psychiatric Cases

Fong, Seraphina; Carollo, Alessandro; Martina Dal Maso,; Martinotti, Giovanni; Luciani, Debora; Yasser Saeed Khan,; Pellegrini, Luca; Corazza, Ornella; Esposito, Gianluca

doi:10.31219/osf.io/fd8w5_v2

Despite the centrality of the diagnostic assessment in psychiatry, the agreement among mental health practitioners often varies from poor to moderate. The potential of Large Language Models (LLMs; such as ChatGPT), among other approaches, has been studied to be used as standardized tools to support clinicians’ decision-making. The current work investigates the diagnostic accuracy of ChatGPT 3.5 (gpt-3.5) across different case presentation styles (i.e., vignette and outline) and prompting techniques. A total of 46 psychiatric cases with an accompanying diagnosis were used. Two trained clinical psychologists evaluated the accuracy of the generated diagnosis against the reference diagnosis. A robust statistical approach was then used to investigate the effect of case format and prompt type on the average diagnostic accuracy. The results showed a moderate agreement between the ratings of the two clinical psychologists (kappa = 0.687). Moreover, a statistically significant main effect of prompting technique on gpt-3.5 diagnostic accuracy emerged (p = .009). The highest accuracy was achieved when gpt-3.5 was simply instructed to provide and justify a single diagnosis for each case as compared to when it was asked to provide a diagnosis likelihood (p < .001) or when it was asked to act as a clinical psychologist (p = .001). The results of the current work reinforce the potential to use LLMs as a supporting tool for the diagnostic step in psychiatry and provide a general indication in order to ensure good performance when using them. Additionally, this study offers a methodological framework that can serve as an example for future research aiming to systematically evaluate LLMs’ diagnostic capabilities across different prompting strategies and case presentation formats.

Simple Prompting Enhances ChatGPT’s Diagnostic Accuracy in Psychiatric Cases / Fong, S., Carollo, A., Dal Maso, M., Martinotti, G., Luciani, D., Saeed Khan, Y., Pellegrini, L., Corazza, O., Esposito, G.. - (2025), pp. ---. [Epub ahead of print] [10.31219/osf.io/fd8w5_v2]