Although there is no consensus on how to measure and quantify individual performance in any sport, there has been less development in this area for soccer than for other major sports. And only once this measurement is defined does modeling for predictive purposes make sense. We use the player ratings provided by a popular Italian fantasy soccer game as proxies for the players’ performance; we discuss the merits and flaws of a variety of hierarchical Bayesian models for predicting these ratings, comparing the models on their predictive accuracy on hold-out data. Our central goals are to explore what can be accomplished with a simple freely available dataset comprising only a few variables fromthe 2015–2016 season in the top Italian league, Serie A, and to focus on a small number of interesting modeling and prediction questions that arise. Among these, we highlight the importance of modeling the missing observations and we propose two models designed for this task. We validate our models through graphical posterior predictive checks and we provide out-of-sample predictions for the second half of the season, using the first half as a training set. We use Stan to sample from the posterior distributions via Markov chain Monte Carlo.

Bayesian hierarchical models for predicting individual performance in soccer

Egidi, Leonardo
;
2018-01-01

Abstract

Although there is no consensus on how to measure and quantify individual performance in any sport, there has been less development in this area for soccer than for other major sports. And only once this measurement is defined does modeling for predictive purposes make sense. We use the player ratings provided by a popular Italian fantasy soccer game as proxies for the players’ performance; we discuss the merits and flaws of a variety of hierarchical Bayesian models for predicting these ratings, comparing the models on their predictive accuracy on hold-out data. Our central goals are to explore what can be accomplished with a simple freely available dataset comprising only a few variables fromthe 2015–2016 season in the top Italian league, Serie A, and to focus on a small number of interesting modeling and prediction questions that arise. Among these, we highlight the importance of modeling the missing observations and we propose two models designed for this task. We validate our models through graphical posterior predictive checks and we provide out-of-sample predictions for the second half of the season, using the first half as a training set. We use Stan to sample from the posterior distributions via Markov chain Monte Carlo.
File in questo prodotto:
File Dimensione Formato  
jqas-2017-0066.pdf

Open Access dal 16/08/2019

Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 943.71 kB
Formato Adobe PDF
943.71 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2929586
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 5
social impact