A parametric framework for multidimensional linear measurement error regression

PLoS One. 2022 Jan 21;17(1):e0262148. doi: 10.1371/journal.pone.0262148. eCollection 2022.

Abstract

The ordinary linear regression method is limited to bivariate data because it is based on the Cartesian representation y = f(x). Using the chain rule, we transform the method to the parametric representation (x(t), y(t)) and obtain a linear regression framework in which the weighted average is used as a parameter for a multivariate linear relation for a set of linearly related variable vectors (LRVVs). We confirm the proposed approach by a Monte Carlo simulation, where the minimum coefficient of variation for error (CVE) provides the optimal weights when forming a weighted average of LRVVs. Then, we describe a parametric linear regression (PLR) algorithm in which the Moore-Penrose pseudoinverse is used to estimate measurement error regression (MER) parameters individually for the given variable vectors. We demonstrate that MER parameters from the PLR and nonlinear ODRPACK methods are quite similar for a wide range of reliability ratios, but ODRPACK is formulated only for bivariate data. We identify scale invariant quantities for the PLR and weighted orthogonal regression (WOR) methods and their correspondences with the partitioned residual effects between the variable vectors. Thus, the specification of an error model for the data is essential for MER and we discuss the use of Monte Carlo methods for estimating the distributions and confidence intervals for MER slope and correlation coefficient. We distinguish between elementary covariance for the y = f(x) representation and covariance vector for the (x(t), y(t)) representation. We also discuss the multivariate generalization of the Pearson correlation as the contraction between Cartesian polyad alignment tensors for the LRVVs and weighted average. Finally, we demonstrate the use of multidimensional PLR in estimating the MER parameters for replicate RNA-Seq data and quadratic regression for estimating the parameters of the conical dispersion of read count data about the MER line.

MeSH terms

  • Algorithms*
  • Linear Models
  • Monte Carlo Method

Grants and funding

The author, Stanley Luck, is a member of Vector Analytics LLC, which is a science consulting company. The funder provided support in the form of salaries for authors [SL], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.