Objectives: Fetal growth restriction (FGR) significantly contribute to perinatal morbidity, mortality, and long-term adverse health outcomes. While small for gestational age (SGA) is often used as a proxy for FGR, it does not necessarily indicate pathological growth restriction. Given the increasing interest in machine learning (ML) for predicting FGR/SGA, this study systematically reviews ML applications in this domain, evaluating their methodological rigor and reporting quality, following standardized guidelines. Data sources: The systematic search was conducted in MEDLINE and Scopus on June 21, 2024, following PRISMA 2020 guidelines. Study eligibility criteria: Eligible studies implemented ML models for FGR/SGA prediction using routinely available clinical variables and reported at least one area under the receiver operating characteristic (AUROC) and/or accuracy. Exclusions included preprints, conference abstracts, systematic reviews, animal studies, and models relying exclusively on biomarkers or genomics, as not part of the clinical practice. Study appraisal and synthesis methods: Two independent reviewers screened articles with the help of the Rayyan software. Risk of bias was assessed using the PROBAST checklist. Adherence to the guidelines on the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis+artificial intelligence (TRIPOD+AI) was evaluated across methods, results, and discussion sections using a 4-point Likert scale. Sample size adequacy was assessed for each study, accounting for outcome type, predictors, and outcome prevalence. Results: The search identified 272 studies, with 20 meeting the inclusion criteria. Definitions of FGR/SGA were inconsistent, particularly in technical journals. Adherence to TRIPOD+AI guidelines was variable, as no model reported on fairness or heterogeneity across relevant subgroups, and only 15% reported on calibration. Only 30% of studies met the minimum sample size required for ML models, indicating potential overfitting and limited generalizability. Conclusion: Despite the potential of ML models in predicting FGR/SGA, key limitations persist, including inconsistent outcome definitions, underpowered models, and suboptimal reporting of calibration and clinical applicability. Future studies should emphasize standardized definitions, robust sample sizes, and comprehensive reporting to enhance model reliability and clinical translation.
Assessing adherence to TRIPOD+AI guidelines in machine learning models for predicting small for gestational age and fetal growth restriction: a systematic review / Zamagni, Giulia; Fregona, Camilla; Barbieri, Moira; Scalia, Maria Sole; Monasta, Lorenzo; Lees, Christoph; Stampalija, Tamara; Barbati, Giulia. - In: AMERICAN JOURNAL OF OBSTETRICS & GYNECOLOGY, MATERNAL-FETAL MEDICINE. - ISSN 2589-9333. - ELETTRONICO. - 8:2(2026), pp. 101862."-"-101862."-". [10.1016/j.ajogmf.2025.101862]
Assessing adherence to TRIPOD+AI guidelines in machine learning models for predicting small for gestational age and fetal growth restriction: a systematic review
Zamagni, Giulia
Primo
;Fregona, CamillaSecondo
;Barbieri, Moira;Scalia, Maria Sole;Stampalija, TamaraPenultimo
;Barbati, GiuliaUltimo
2026-01-01
Abstract
Objectives: Fetal growth restriction (FGR) significantly contribute to perinatal morbidity, mortality, and long-term adverse health outcomes. While small for gestational age (SGA) is often used as a proxy for FGR, it does not necessarily indicate pathological growth restriction. Given the increasing interest in machine learning (ML) for predicting FGR/SGA, this study systematically reviews ML applications in this domain, evaluating their methodological rigor and reporting quality, following standardized guidelines. Data sources: The systematic search was conducted in MEDLINE and Scopus on June 21, 2024, following PRISMA 2020 guidelines. Study eligibility criteria: Eligible studies implemented ML models for FGR/SGA prediction using routinely available clinical variables and reported at least one area under the receiver operating characteristic (AUROC) and/or accuracy. Exclusions included preprints, conference abstracts, systematic reviews, animal studies, and models relying exclusively on biomarkers or genomics, as not part of the clinical practice. Study appraisal and synthesis methods: Two independent reviewers screened articles with the help of the Rayyan software. Risk of bias was assessed using the PROBAST checklist. Adherence to the guidelines on the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis+artificial intelligence (TRIPOD+AI) was evaluated across methods, results, and discussion sections using a 4-point Likert scale. Sample size adequacy was assessed for each study, accounting for outcome type, predictors, and outcome prevalence. Results: The search identified 272 studies, with 20 meeting the inclusion criteria. Definitions of FGR/SGA were inconsistent, particularly in technical journals. Adherence to TRIPOD+AI guidelines was variable, as no model reported on fairness or heterogeneity across relevant subgroups, and only 15% reported on calibration. Only 30% of studies met the minimum sample size required for ML models, indicating potential overfitting and limited generalizability. Conclusion: Despite the potential of ML models in predicting FGR/SGA, key limitations persist, including inconsistent outcome definitions, underpowered models, and suboptimal reporting of calibration and clinical applicability. Future studies should emphasize standardized definitions, robust sample sizes, and comprehensive reporting to enhance model reliability and clinical translation.| File | Dimensione | Formato | |
|---|---|---|---|
|
Zamagni_review.pdf
accesso aperto
Tipologia:
Documento in Versione Editoriale
Licenza:
Creative commons
Dimensione
785.63 kB
Formato
Adobe PDF
|
785.63 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


