[edit]
Forecasting Tennis Player Matches Based on Machine Learning
Proceedings of 2024 International Conference on Machine Learning and Intelligent Computing, PMLR 245:393-403, 2024.
Abstract
This paper aims to highlight the extensive potential of analytics with the use of machine learning to improve sports modelling. We propose a supervised machine learning approach to further extend the optimization of machine learning in predicting the flow of points in tennis matches. Using data sourced from the 2023 Wimbledon Gentlemen’s singles matches, we used Grey Relational Analysis and membership functions from fuzzy set theory to extract and rank 7 features that exhibit impactful ties with the player’s match outcome, which includes player’s serve status, games won in current set, ranking difference, distance covered, serve speed, previous victory status, and unforced error, respectively. We implemented these features to build 3 supervised models and compare their predictive performances, namely K-Nearest Neighbours, XGBoost and Logistic Regression. We adopted a train test split measure of 300 training sets and 100 testing sets. Using performance metrics such as confusion matrices, ROC curves, F1, Precision, Recall, and Accuracy scores, we constructed a scoring table to rank implemented models. Our results demonstrated that XGBoost exhibited the most significant predictive performance, followed by KNN and Logistic Regression. 5-Fold cross-validation feature stability and sensitivity analysis suggests that the feature space cre- ated is robust and stable where features are not easily subject to change in short-term predictions.