Imputation of missing time-activity data with long-term gaps: A multi-scale residual CNN-LSTM network model

Comput Environ Urban Syst. 2022 Jul:95:101823. doi: 10.1016/j.compenvurbsys.2022.101823. Epub 2022 May 25.

Abstract

Despite the increasing availability and spatial granularity of individuals' time-activity (TA) data, the missing data problem, particularly long-term gaps, remains as a major limitation of TA data as a primary source of human mobility studies. In the present study, we propose a two-step imputation method to address the missing TA data with long-term gaps, based on both efficient representation of TA patterns and high regularity in TA data. The method consists of two steps: (1) the continuous bag-of-words word2vec model to convert daily TA sequences into a low-dimensional numerical representation to reduce complexity; (2) a multi-scale residual Convolutional Neural Network (CNN)-stacked Long Short-Term Memory (LSTM) model to capture multi-scale temporal dependencies across historical observations and to predict the missing TAs. We evaluated the performance of the proposed imputation method using the mobile phone-based TA data collected from 180 individuals in western New York, USA, from October 2016 to May 2017, with a 10-fold out-of-sample cross-validation method. We found that the proposed imputation method achieved excellent performance with 84% prediction accuracy, which led us to conclude that the proposed imputation method was successful at reconstructing the sequence, duration, and spatial extent of activities from incomplete TA data. We believe that the proposed imputation method can be applied to impute incomplete TA data with relatively long-term gaps with high accuracy.

Keywords: Embedding; Imputation; Long-term gaps in time-activity data; Missing data; Multi-scale residual CNN-stacked LSTM.