Nowadays, modelling football outcomes is widespread and popular and the challenge to include relevant predictors along with new possible correlations is strong. From a statistical point of view, two approaches are designed to achieve this task: the goals-based (direct) models (Dixon and Coles, 1997; Karlis and Ntzoufras, 2003) for the number of goals scored by two competing teams; and the results-based (indirect) models, for the probability of the categorical outcome of a win, a draw, or a loss, the so-called three-way process. Both the frameworks have pro and cons; a long debate has been produced to state which approach is better, and many agreed that any direct comparison between the forecasting abilities of the two types of models must be based on forecasts of match results (Goddard, 2005). Machine Learning tools such as Classification and Regression Trees (CART, Breiman et al. (1984)) and Random Forests represent alternatives to predict new match results (Groll et al., 2019) and in some cases have proved to be successful. In this paper we develop a broad comparison between some statistical results-based models and some results-based Machine Learning algorithms, to explore predictive performance for future matches. Although not conclusive, we believe our comparison review may be beneficial for future scholars to discern between goals-based and results-based models.

Comparing statistical models and machine learning algorithms in predicting football outcomes

Leonardo Egidi
;
Nicola Torelli
2019-01-01

Abstract

Nowadays, modelling football outcomes is widespread and popular and the challenge to include relevant predictors along with new possible correlations is strong. From a statistical point of view, two approaches are designed to achieve this task: the goals-based (direct) models (Dixon and Coles, 1997; Karlis and Ntzoufras, 2003) for the number of goals scored by two competing teams; and the results-based (indirect) models, for the probability of the categorical outcome of a win, a draw, or a loss, the so-called three-way process. Both the frameworks have pro and cons; a long debate has been produced to state which approach is better, and many agreed that any direct comparison between the forecasting abilities of the two types of models must be based on forecasts of match results (Goddard, 2005). Machine Learning tools such as Classification and Regression Trees (CART, Breiman et al. (1984)) and Random Forests represent alternatives to predict new match results (Groll et al., 2019) and in some cases have proved to be successful. In this paper we develop a broad comparison between some statistical results-based models and some results-based Machine Learning algorithms, to explore predictive performance for future matches. Although not conclusive, we believe our comparison review may be beneficial for future scholars to discern between goals-based and results-based models.
2019
978-88-5495-135-8
https://www.sa-ijas.org/wp/download_files/asa_brescia_2019/Carpita_Fabbris_eds-(2019)-ASA_Conference_Book_of_Short_Papers.pdf
https://www.sa-ijas.org/statistics-for-health-and-well-being/
File in questo prodotto:
File Dimensione Formato  
Egidi_Comparing statistical models and machine learning.pdf

Accesso chiuso

Descrizione: contributo con frontespizio e indice
Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 471.36 kB
Formato Adobe PDF
471.36 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2952137
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact