Despite improvements, there are still two main reasons why clinical trials fail. These are drug ineffectiveness and drug-induced toxicity, which are primarily the result of poor cohort selection and patient monitoring. Machine learningis an area of artificial intelligence that allows computers to learn without being explicitly programmed by analysing and drawing conclusions from data patterns. This thesis investigated innovative strategies for modernising the process of drug clinical development by incorporating machine learning-based algorithms to uncover clinically significant patterns from various sources of data, culminating in classification models. This thesis work focused into three primary research issues. The first research issue concerns the use of an machine learning approach to identify known and novel predictors of dose limiting toxicity by analysing clinical, baseline blood biochemistry (i.e., prior to starting the phase I), and genetic data derived from a previously conducted phase Ib clinical trial in metastatic colorectal cancer patients treated with FOLFIRI (folinic acid, 5-fluorouracil, irinotecan) plus bevacizumab regimen. The analyses pipeline used includes a step selecting the best predictors based on importance rankings; the optimal subset was then used to train models. The performance of five machine learning classification models was evaluated in order to select the best classifier. The Random Forest model performed best during cross-validation, with a mean Matthews correlation coefficient of 0.549 and a mean accuracy of 80.4%; at baseline, the top predictors of dose-limiting toxicity were haemoglobin, serum glutamic oxaloacetic transaminase (SGOT), and albumin. The second thesis question aims to evaluate the relationship between genetic variations covering over 60 candidate genes and carboplatin, taxane, and bevacizumab-induced toxicities in patients with ovarian cancer enrolled in a phase IV study. Machine learning techniques were employed to investigate and prioritise germline genetic variants associated with drug-induced toxicities, specifically hypertension, hemalogical toxicity, non hemalogical toxicity and proteinuria. The Boruta algorithm was used in a cross-validation fashion to determine the significance of SNPs for predicting toxicities. The process revealed which SNPs were actually important, and those were subsequently used to train each XGBoost classifier. During cross-validation toxicities models achieved reliable performances with an Matthews correlation coefficient score that varied from 0.375 to 0.410 (Accuracy from 0.696 to 0.789). The third study topic aims to develop and validate a predictive machine learning model capable of classifying hepatocellular carcinoma patients based on their cancer progression status six months after treatment using their DNA methylation profile. The genome-wide DNA methylation profile of 374 primary tumor specimens was used in combination of Machine Learning algorithms (Recursive Features Selection, Boruta) to capture early tumor progression features. The subsets of probes obtained were used to train and validate Random Forest models to predict a Progression Free Survival greater or less than 6 months. A model based on 34 epigenetic probes showed the best performance, scoring 0.80 Accuracy and 0.51 Matthews Correlation Coefficient on testset. In conclusion, this thesis present practical machine learning applications that lead to the creation of novel ways for modernising the medication clinical development process in clinical trials. The models and the evidences generated from these applications might be employed in the CT ecosystem to identify patients who are most likely to benefit from the treatment, making trials safer and faster while also cutting failure rates. Moreover, the analytic frameworks proposed in this thesis are generalizable and adaptable to outcomes and pathologies that fall far outside the sphere of pharmacology.
Despite improvements, there are still two main reasons why clinical trials fail. These are drug ineffectiveness and drug-induced toxicity, which are primarily the result of poor cohort selection and patient monitoring. Machine learningis an area of artificial intelligence that allows computers to learn without being explicitly programmed by analysing and drawing conclusions from data patterns. This thesis investigated innovative strategies for modernising the process of drug clinical development by incorporating machine learning-based algorithms to uncover clinically significant patterns from various sources of data, culminating in classification models. This thesis work focused into three primary research issues. The first research issue concerns the use of an machine learning approach to identify known and novel predictors of dose limiting toxicity by analysing clinical, baseline blood biochemistry (i.e., prior to starting the phase I), and genetic data derived from a previously conducted phase Ib clinical trial in metastatic colorectal cancer patients treated with FOLFIRI (folinic acid, 5-fluorouracil, irinotecan) plus bevacizumab regimen. The analyses pipeline used includes a step selecting the best predictors based on importance rankings; the optimal subset was then used to train models. The performance of five machine learning classification models was evaluated in order to select the best classifier. The Random Forest model performed best during cross-validation, with a mean Matthews correlation coefficient of 0.549 and a mean accuracy of 80.4%; at baseline, the top predictors of dose-limiting toxicity were haemoglobin, serum glutamic oxaloacetic transaminase (SGOT), and albumin. The second thesis question aims to evaluate the relationship between genetic variations covering over 60 candidate genes and carboplatin, taxane, and bevacizumab-induced toxicities in patients with ovarian cancer enrolled in a phase IV study. Machine learning techniques were employed to investigate and prioritise germline genetic variants associated with drug-induced toxicities, specifically hypertension, hemalogical toxicity, non hemalogical toxicity and proteinuria. The Boruta algorithm was used in a cross-validation fashion to determine the significance of SNPs for predicting toxicities. The process revealed which SNPs were actually important, and those were subsequently used to train each XGBoost classifier. During cross-validation toxicities models achieved reliable performances with an Matthews correlation coefficient score that varied from 0.375 to 0.410 (Accuracy from 0.696 to 0.789). The third study topic aims to develop and validate a predictive machine learning model capable of classifying hepatocellular carcinoma patients based on their cancer progression status six months after treatment using their DNA methylation profile. The genome-wide DNA methylation profile of 374 primary tumor specimens was used in combination of Machine Learning algorithms (Recursive Features Selection, Boruta) to capture early tumor progression features. The subsets of probes obtained were used to train and validate Random Forest models to predict a Progression Free Survival greater or less than 6 months. A model based on 34 epigenetic probes showed the best performance, scoring 0.80 Accuracy and 0.51 Matthews Correlation Coefficient on testset. In conclusion, this thesis present practical machine learning applications that lead to the creation of novel ways for modernising the medication clinical development process in clinical trials. The models and the evidences generated from these applications might be employed in the CT ecosystem to identify patients who are most likely to benefit from the treatment, making trials safer and faster while also cutting failure rates. Moreover, the analytic frameworks proposed in this thesis are generalizable and adaptable to outcomes and pathologies that fall far outside the sphere of pharmacology.
Implement Machine Learning Approaches in Cancer Clinical Trials / Bedon, Luca. - (2023 Feb 21).
Implement Machine Learning Approaches in Cancer Clinical Trials
BEDON, LUCA
2023-02-21
Abstract
Despite improvements, there are still two main reasons why clinical trials fail. These are drug ineffectiveness and drug-induced toxicity, which are primarily the result of poor cohort selection and patient monitoring. Machine learningis an area of artificial intelligence that allows computers to learn without being explicitly programmed by analysing and drawing conclusions from data patterns. This thesis investigated innovative strategies for modernising the process of drug clinical development by incorporating machine learning-based algorithms to uncover clinically significant patterns from various sources of data, culminating in classification models. This thesis work focused into three primary research issues. The first research issue concerns the use of an machine learning approach to identify known and novel predictors of dose limiting toxicity by analysing clinical, baseline blood biochemistry (i.e., prior to starting the phase I), and genetic data derived from a previously conducted phase Ib clinical trial in metastatic colorectal cancer patients treated with FOLFIRI (folinic acid, 5-fluorouracil, irinotecan) plus bevacizumab regimen. The analyses pipeline used includes a step selecting the best predictors based on importance rankings; the optimal subset was then used to train models. The performance of five machine learning classification models was evaluated in order to select the best classifier. The Random Forest model performed best during cross-validation, with a mean Matthews correlation coefficient of 0.549 and a mean accuracy of 80.4%; at baseline, the top predictors of dose-limiting toxicity were haemoglobin, serum glutamic oxaloacetic transaminase (SGOT), and albumin. The second thesis question aims to evaluate the relationship between genetic variations covering over 60 candidate genes and carboplatin, taxane, and bevacizumab-induced toxicities in patients with ovarian cancer enrolled in a phase IV study. Machine learning techniques were employed to investigate and prioritise germline genetic variants associated with drug-induced toxicities, specifically hypertension, hemalogical toxicity, non hemalogical toxicity and proteinuria. The Boruta algorithm was used in a cross-validation fashion to determine the significance of SNPs for predicting toxicities. The process revealed which SNPs were actually important, and those were subsequently used to train each XGBoost classifier. During cross-validation toxicities models achieved reliable performances with an Matthews correlation coefficient score that varied from 0.375 to 0.410 (Accuracy from 0.696 to 0.789). The third study topic aims to develop and validate a predictive machine learning model capable of classifying hepatocellular carcinoma patients based on their cancer progression status six months after treatment using their DNA methylation profile. The genome-wide DNA methylation profile of 374 primary tumor specimens was used in combination of Machine Learning algorithms (Recursive Features Selection, Boruta) to capture early tumor progression features. The subsets of probes obtained were used to train and validate Random Forest models to predict a Progression Free Survival greater or less than 6 months. A model based on 34 epigenetic probes showed the best performance, scoring 0.80 Accuracy and 0.51 Matthews Correlation Coefficient on testset. In conclusion, this thesis present practical machine learning applications that lead to the creation of novel ways for modernising the medication clinical development process in clinical trials. The models and the evidences generated from these applications might be employed in the CT ecosystem to identify patients who are most likely to benefit from the treatment, making trials safer and faster while also cutting failure rates. Moreover, the analytic frameworks proposed in this thesis are generalizable and adaptable to outcomes and pathologies that fall far outside the sphere of pharmacology.File | Dimensione | Formato | |
---|---|---|---|
Luca_Bedon_PhD_thesis_def_rev.pdf
Open Access dal 22/02/2024
Descrizione: Tesi PhD Luca Bedon
Tipologia:
Tesi di dottorato
Dimensione
14.98 MB
Formato
Adobe PDF
|
14.98 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.