A Massive Data Framework for M-Estimators with Cubic-Rate

J Am Stat Assoc. 2018;113(524):1698-1709. doi: 10.1080/01621459.2017.1360779. Epub 2018 Jun 19.

Abstract

The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing the asymptotic distribution of the aggregated M-estimators using a weighted average with weights depending on the subgroup sample sizes. Under certain condition on the growing rate of the number of subgroups, the resulting aggregated estimators are shown to have faster convergence rate and asymptotic normal distribution, which are more tractable in both computation and inference than the original M-estimators based on pooled data. Our theory applies to a wide class of M-estimators with cube root convergence rate, including the location estimator, maximum score estimator and value search estimator. Empirical performance via simulations and a real data application also validate our theoretical findings.

Keywords: Cubic rate asymptotics; M-estimators; divide and conquer; massive data.