Design and implementation of a parallel geographically weighted k-nearest neighbor classifier

Comput Geosci. 2019 Jun:127:111-122. doi: 10.1016/j.cageo.2019.02.009. Epub 2019 Mar 5.

Abstract

The development of high-performance classifiers represents an important step in improving the timeliness of remote sensing classification in the era of high spatial resolution. Geographically weighted k-nearest neighbors (gwk-NN)-a classifier that incorporates spatial information into the traditional k-NN classifier-has demonstrated to be better at mitigating salt-and-pepper noise and misclassification. However, the integration of spatial dependence relationships into spectral information is computationally intensive. To improve computing performance, this paper discusses two commonly used parallel strategies-data and task parallelism-used to parallelize the gwk-NN classifier in the model training and classification stages, and implements the parallel algorithm by calling MPI and GDAL in the C++ development environment on a standalone eight-core computer. We further investigate the potential performance of dual parallelism (the simultaneous exploitation of data and task parallelism) in image classification. The experimental results demonstrate that the parallel gwk-NN classifier can improve the efficiency of high-resolution, remotely sensed images with multiple land cover types. Specifically, data parallelism is more effective than task parallelism in both model training and classification stages because of the minor role of parallel overhead in total execution time. In addition, dual parallelism can take advantage of data and task parallel strategies in the image classification stage, as evidenced by the two largest speedups attained under dual parallelism I (5.28×) and II (5.73×). Comparatively, dual parallelism II, in which priority is given to data decomposition, achieves the best performance by overlapping computation and data transmission, which is compatible with the current trend toward multicore architectures.

Keywords: geostatistical models; parallel computing; parallel gwk-NN classifier; remotely sensed image classification.