Predictions from the cloud: using data science to predict sports performance

F. J. Blaauw, A. C. Emerencia, R. den Hartigh, M. Milovanović, I. K. Stoter, P. de Jonge.

Apr 13, 2018


In sport science, a major aim is to unravel the variables and parameters that influence sports performance. A key requirement for investigating these parameters is the availability of high quality data. More specifically, data that contains the variables of interest, and data that could be analyzed to find the factors and relationships between factors that influence athletes’ performance. Therefore, a data platform is needed that offers a means for collecting and storing data from various sports, and that offers a single interface for researchers to interact with and perform research on these data. To date, however, most research data is collected and stored in either researchers’ workstation or in one-off platforms.

In this paper, we present the implementation of the Data Service Hub (DSH) – an open data platform for gathering and storing sports-related data. The goal of the DSH is to offer a Platform as a Service that facilitates the distribution of data to coaches, athletes, or sport fans. Furthermore, the DSH provides a means for sports scientists to run their statistical research tools and provide coaches and athletes with real-time analysis of the collected data. Here, we will specifically focus on our successful case of connecting the DSH to speed skating data.

METHODS AND RESULTS: We used the DSH to collect, aggregate, and process data related to professional speed skaters. These data were obtained through the MYLAPS time system and were used to predict the finish times of the skaters. We trained multiple predictive models and displayed their predictions on a dashboard. With this case study, we show the dataflow of a typical DSH use case and demonstrate the different aspects of the DSH. In addition, we present finish time predictions on a 500m, based on the first 139m. Among twenty-two speedskaters, the predicted finish times were close to the actual finish times (0.52% error on average).

CONCLUSION, DISCUSSION, AND FUTURE PERSPECTIVES: The presented setup is a proof of concept of a new data science technology in the field of speed skating. The possible applications of the DSH extend across different sports and users. On the one hand the DSH could serve as a platform to provide coaches with a toolbox to get insight into and improve the performance of their athletes. On the other hand it could provide data to researchers, potentially bringing sport science research to the next level. Applications outside of the sport science field are also an option. For example, the DSH could be used for storing longitudinal psychological data, or even a combination of different sources of data.