Objective: To examine the quality of information provided by artificial intelligence platforms ChatGPT-4 and Claude 2 surrounding the management of vestibular schwannomas. Study design: Cross-sectional. Setting: Skull base surgeons were involved from different centers and countries. Intervention: Thirty-six questions regarding vestibular schwannoma management were tested. Artificial intelligence responses were subsequently evaluated by 19 lateral skull base surgeons using the Quality Assessment of Medical Artificial Intelligence (QAMAI) questionnaire, assessing "Accuracy," "Clarity," "Relevance," "Completeness," "Sources," and "Usefulness." Main outcome measure: The scores of the answers from both chatbots were collected and analyzed using the Student t test. Analysis of responses grouped by stakeholders was performed with McNemar test. Stuart-Maxwell test was used to compare reading level among chatbots. Intraclass correlation coefficient was calculated. Results: ChatGPT-4 demonstrated significantly improved quality over Claude 2 in 14 of 36 (38.9%) questions, whereas higher-quality scores for Claude 2 were only observed in 2 (5.6%) answers. Chatbots exhibited variation across the dimensions of "Accuracy," "Clarity," "Completeness," "Relevance," and "Usefulness," with ChatGPT-4 demonstrating a statistically significant superior performance. However, no statistically significant difference was found in the assessment of "Sources." Additionally, ChatGPT-4 provided information at a significant lower reading grade level. Conclusions: Artificial intelligence platforms failed to consistently provide accurate information surrounding the management of vestibular schwannoma, although ChatGPT-4 achieved significantly higher scores in most analyzed parameters. These findings demonstrate the potential for significant misinformation for patients seeking information through these platforms.

Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Management of Vestibular Schwannomas: A Comparative Analysis Between ChatGPT-4 and Claude 2

Sia, Egidio
Co-primo
;
Capriotti, Vincenzo;Gatto, Annalisa;Vaira, Luigi Angelo;Giudici, Fabiola;Tirelli, Giancarlo
Penultimo
;
Boscolo-Rizzo, Paolo
Ultimo
2025-01-01

Abstract

Objective: To examine the quality of information provided by artificial intelligence platforms ChatGPT-4 and Claude 2 surrounding the management of vestibular schwannomas. Study design: Cross-sectional. Setting: Skull base surgeons were involved from different centers and countries. Intervention: Thirty-six questions regarding vestibular schwannoma management were tested. Artificial intelligence responses were subsequently evaluated by 19 lateral skull base surgeons using the Quality Assessment of Medical Artificial Intelligence (QAMAI) questionnaire, assessing "Accuracy," "Clarity," "Relevance," "Completeness," "Sources," and "Usefulness." Main outcome measure: The scores of the answers from both chatbots were collected and analyzed using the Student t test. Analysis of responses grouped by stakeholders was performed with McNemar test. Stuart-Maxwell test was used to compare reading level among chatbots. Intraclass correlation coefficient was calculated. Results: ChatGPT-4 demonstrated significantly improved quality over Claude 2 in 14 of 36 (38.9%) questions, whereas higher-quality scores for Claude 2 were only observed in 2 (5.6%) answers. Chatbots exhibited variation across the dimensions of "Accuracy," "Clarity," "Completeness," "Relevance," and "Usefulness," with ChatGPT-4 demonstrating a statistically significant superior performance. However, no statistically significant difference was found in the assessment of "Sources." Additionally, ChatGPT-4 provided information at a significant lower reading grade level. Conclusions: Artificial intelligence platforms failed to consistently provide accurate information surrounding the management of vestibular schwannoma, although ChatGPT-4 achieved significantly higher scores in most analyzed parameters. These findings demonstrate the potential for significant misinformation for patients seeking information through these platforms.
File in questo prodotto:
File Dimensione Formato  
quality_of_information_provided_by_artificial.18.pdf

Accesso chiuso

Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 190.05 kB
Formato Adobe PDF
190.05 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3105539
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact