2 34 CHAPTER 2 algorithms was first determined for seventy percent of the whole dataset including fivefold cross-validation with scaling of the factors for KNN, NN, SGD, and SVC. Subsequently, for every participant we individualized the algorithms with five-fold cross-validation and grid search on selected hyperparameters. Seventy percent of the available individual data was used as training data. After training the algorithms, the algorithms were turned into persistent predictive models per participant. We used the individual models to construct confusion matrices, which in turn served as a basis for the F1-score and the accuracy per individual predictive model. To compare the performance of the machine learning models, we included a baseline model. This baseline model checks the cumulative step count. If this cumulative step count equals or exceeds the average personalized goal, the model returns true and false otherwise. We ranked all machine learning models (including the baseline model) using the average of the F1-score and the accuracy. 3.6. Proof of Concept We designed and implemented a Web application to demonstrate how the personalized prediction based on machine learning and activity tracker data could be used in practice. We developed this application as a Web application, which can be accessed on http:// personalized-coaching.compsy.nl/. In this application, the user can input the values `Hour of the day’, `Steps previous hour’, `Total steps till the Hour’, combined with the participant’s ID and the algorithm to use. The Web application then uses the individualized model and input data to predict the outcome together with the probability thereof. 3.7. Implementation Details We used scikit-learn (v0.18, [48]) to establish the best predictive model for the individual. Scikit-learn is an open-source Python module integrating a wide range of machine learning algorithms. Scikit-learn was integrated in Anaconda (v4.2.13, [49]) and Jupyter Notebooks (4.0.6, [50]) was used in combination with Python (v3.5.2, [49]) for creating the data processing and machine learning pipeline. Jupyter Notebooks is an interactive method to write and run various programming languages, such as Python. The participants, their physical activity data, and the results of the performance of the algorithms and models were saved in an Oracle database (v11g2 XE; [51]). The Oracle database management system is a widely used SQL-based system for persisting data. The source code and corresponding notebooks of the machine learning procedure is available as open-source software on Github (https://github.com/compsy/personalized-coaching-ml). For the Web application, we used Flask (Version 0.10.1, [52]), a Python-based Web application microframework for developing Web applications. We used a PostgreSQL database to store information regarding the models and the participants. The machine learning models resulting from the pipeline are exported as Python Pickle files, which were imported into the Web application. The infrastructure-as-a-service provider
RkJQdWJsaXNoZXIy MjY0ODMw