Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2017 Dec 22;18(1):18.
doi: 10.3390/s18010018.

Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

Affiliations
Free PMC article

Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

Phan Thanh Noi et al. Sensors (Basel). .
Free PMC article

Abstract

In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

Keywords: Random Forest (RF); Sentinel-2; Support Vector Machine (SVM); classification algorithms; k-Nearest Neighbor (kNN); training sample size.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure A1
Figure A1
The classification results with the highest overall accuracy for each classifier.
Figure A1
Figure A1
The classification results with the highest overall accuracy for each classifier.
Figure 1
Figure 1
Number of articles in the ISI Web of Knowledge for a general search on different keywords in the last decade (2007–2017*). *Accessed on 13 August 2017.
Figure 2
Figure 2
Flowchart of the study methods.
Figure 3
Figure 3
Location of the study area in the north of the Red River Delta (RRD), Vietnam.
Figure 4
Figure 4
The relationship between classification error (y-axis) and k value (x-axis) parameters (1 to 20) of the kNN classifier obtained from the bootstrap resampling approach using different sub-datasets of training sample data.
Figure 4
Figure 4
The relationship between classification error (y-axis) and k value (x-axis) parameters (1 to 20) of the kNN classifier obtained from the bootstrap resampling approach using different sub-datasets of training sample data.
Figure 5
Figure 5
Effect of the number of trees and the number of random split variables at each node (mtry) on the overall accuracy for RF classification using all training sample data.
Figure 5
Figure 5
Effect of the number of trees and the number of random split variables at each node (mtry) on the overall accuracy for RF classification using all training sample data.
Figure 6
Figure 6
The relationship between OOB error (y-axis) and ntree parameter (x-axis) of the RF classifier using different sub-datasets of training sample data.
Figure 6
Figure 6
The relationship between OOB error (y-axis) and ntree parameter (x-axis) of the RF classifier using different sub-datasets of training sample data.
Figure 7
Figure 7
The relationship between classification error and parameters (C and γ) of the SVM classifier obtained from different sub-datasets of training sample data.
Figure 7
Figure 7
The relationship between classification error and parameters (C and γ) of the SVM classifier obtained from different sub-datasets of training sample data.
Figure 8
Figure 8
The performance of the kNN, SVM, and RF classifiers on different imbalanced training sample sizes.
Figure 9
Figure 9
The performance of kNN, SVM, and RF classifiers on different balanced training sample sizes.
Figure 10
Figure 10
The difference in OA using balanced and imbalanced datasets.

Similar articles

See all similar articles

Cited by 12 articles

See all "Cited by" articles

References

    1. DeFries R.S., Foley J.A., Asner G.P. Land-use choices: Balancing human needs and ecosystem function. Front. Ecol. Environ. 2004;2:249–257. doi: 10.1890/1540-9295(2004)002[0249:LCBHNA]2.0.CO;2. - DOI
    1. Foley J.A., DeFries R., Asner G.P., Barford C., Bonan G., Carpenter S.R., Chapin F.S., Coe M.T., Daily G.C., Gibbs H.K., et al. Global consequences of land use. Science. 2005;309:570–574. doi: 10.1126/science.1111772. - DOI - PubMed
    1. Verburg P.H., Neumann K., Nol L. Challenges in using land use and land cover data for global change studies. Glob. Chang. Biol. 2011;17:974–989. doi: 10.1111/j.1365-2486.2010.02307.x. - DOI
    1. Hansen T. A review of large area monitoring of land cover change using Landsat data. Remote Sens. Environ. 2012;122:66–74. doi: 10.1016/j.rse.2011.08.024. - DOI
    1. Wessels K.J., Reyers B., Jaarsveld A.S., Rutherford M.C. Identification of potential conflict areas between land transformation and biodiversity conservation in north-eastern South Africa. Agric. Ecosyst. Environ. 2003;95:157–178. doi: 10.1016/S0167-8809(02)00102-0. - DOI
Feedback