Editors, We appreciate the comments provided by Wang et al.1 and the idea of having specialised medical large language models (LLMs) over general models (e.g. ChatGPT). We concur with the authors that, ideally, we would have specialised medical LLMs that leverage domain-specific datasets from sources such as PubMed and the Cochrane Library. How to achieve this, however, can take different forms: utilising prompt architecture tweaks such as retrieval augmented generation with medical guidelines,2 altering the weights of the language model through fine-tuning2, 3 with medical expertise or real/synthetic4 patients' data (clinical, imaging or genomics) derived from electronic health records, moulding the language models to reflect human preferences through reinforcement learning with human feedback, or training language models with tasks specified to match specialised clinical goals. Our review quantifies the variability of general-purpose LLMs like ChatGPT in the field of gastroenterology and hepatology in terms of accuracy (from 6.4% to 91.4%),5 which highlights the nascent stage of LLM application in specialised medical fields. This variability underscores not only the current limitations when general LLMs tackle complex medical topics but also the crucial need for advancements that focus on specialised fields. While general-purpose models like ChatGPT are groundbreaking, they are preliminary steps towards more refined applications. Our findings advocate for a prudent approach to developing next-generation LLMs that meet the stringent requirements of clinical accuracy and reliability and reduce the risk of patient harm due to plausible-sounding but inaccurate answers (i.e., hallucinations).6, 7 As we look to the future, it is crucial to understand whether LLMs can provide accurate answers and perform clinical reasoning tasks.8, 9 Clinical reasoning involves complex decision-making processes that may not be fully captured by existing model training paradigms. This points to a fundamental requirement for defining new tasks that are custom-tailored to facilitate such sophisticated functionalities in LLMs. We appreciate the dialogue initiated by the comments and are excited about the potential transformations in health care that specialised LLMs could bring. We are committed to contributing robustly to this evolving field and to furthering the discourse on effectively integrating LLM technologies into healthcare practices.
Letter: Shifting focus—From ChatGPT to specialised medical LLMs: Authors' reply
Giuffrè, Mauro
Primo
;Kresevic, SimoneSecondo
;
2024-01-01
Abstract
Editors, We appreciate the comments provided by Wang et al.1 and the idea of having specialised medical large language models (LLMs) over general models (e.g. ChatGPT). We concur with the authors that, ideally, we would have specialised medical LLMs that leverage domain-specific datasets from sources such as PubMed and the Cochrane Library. How to achieve this, however, can take different forms: utilising prompt architecture tweaks such as retrieval augmented generation with medical guidelines,2 altering the weights of the language model through fine-tuning2, 3 with medical expertise or real/synthetic4 patients' data (clinical, imaging or genomics) derived from electronic health records, moulding the language models to reflect human preferences through reinforcement learning with human feedback, or training language models with tasks specified to match specialised clinical goals. Our review quantifies the variability of general-purpose LLMs like ChatGPT in the field of gastroenterology and hepatology in terms of accuracy (from 6.4% to 91.4%),5 which highlights the nascent stage of LLM application in specialised medical fields. This variability underscores not only the current limitations when general LLMs tackle complex medical topics but also the crucial need for advancements that focus on specialised fields. While general-purpose models like ChatGPT are groundbreaking, they are preliminary steps towards more refined applications. Our findings advocate for a prudent approach to developing next-generation LLMs that meet the stringent requirements of clinical accuracy and reliability and reduce the risk of patient harm due to plausible-sounding but inaccurate answers (i.e., hallucinations).6, 7 As we look to the future, it is crucial to understand whether LLMs can provide accurate answers and perform clinical reasoning tasks.8, 9 Clinical reasoning involves complex decision-making processes that may not be fully captured by existing model training paradigms. This points to a fundamental requirement for defining new tasks that are custom-tailored to facilitate such sophisticated functionalities in LLMs. We appreciate the dialogue initiated by the comments and are excited about the potential transformations in health care that specialised LLMs could bring. We are committed to contributing robustly to this evolving field and to furthering the discourse on effectively integrating LLM technologies into healthcare practices.File | Dimensione | Formato | |
---|---|---|---|
Aliment Pharmacol Ther - 2024 - Giuffrè - Letter Shifting focus From ChatGPT to specialised medical LLMs Authors reply.pdf
Accesso chiuso
Tipologia:
Documento in Versione Editoriale
Licenza:
Copyright Editore
Dimensione
155.45 kB
Formato
Adobe PDF
|
155.45 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.