Insufficient physical activity is common in modern society. By estimating the energy expenditure (EE) of different physical activities, people can develop suitable exercise plans to improve their lifestyle quality. However, several limitations still exist in the related works. Therefore, the aim of this study is to propose an accurate EE estimation model based on depth camera data with physical activity classification to solve the limitations in the previous research. To decide the best location and amount of cameras of the EE estimation, three depth cameras were set at three locations, namely the side, rear side, and rear views, to obtain the kinematic data and EE estimation. Support vector machine was used for physical activity classification. Three EE estimation models, namely linear regression, multilayer perceptron (MLP), and convolutional neural network (CNN) models, were compared and determined the model with optimal performance in different experimental settings. The results have shown that if only one depth camera is available, optimal EE estimation can be obtained using the side view and MLP model. The mean absolute error (MAE), mean square error (MSE), and root MSE (RMSE) of the classification results under the aforementioned settings were 0.55, 0.66, and 0.81, respectively. If higher accuracy is required, two depth cameras can be set at the side and rear views, the CNN model can be used for light-to-moderate activities, and the MLP model can be used for vigorous activities. The RMSEs for estimating the EEs of standing, walking, and running were 0.19, 0.57, and 0.96, respectively. By applying the different models on different amounts of cameras, the optimal performance can be obtained, and this is also the first study to discuss the issue.
Keywords: activity classification; convolutional neural network; depth camera; energy expenditure; machine learning; multilayer perceptron; physical activity.