Statistical Issues and Group Classification in Plasma MicroRNA Studies With Data Application

Evol Bioinform Online. 2020 Apr 14:16:1176934320913338. doi: 10.1177/1176934320913338. eCollection 2020.

Abstract

The analysis of plasma microRNAs (miRNAs) has been widely used as a method for finding potential biomarkers for human diseases, especially those with a link to cancer. Methods of analyzing plasma miRNA have been thoroughly discussed from sample extraction to data modeling. However, some issues exist within the process that have rarely been talked about. Rice et al. discussed some issues in plasma miRNA studies, such as the lack of standard methodology including the use of different cycle threshold, time to plasma extraction, among others. These issues can lead to inconsistent data, and thus impact the result and assay reproducibility. Other external issues, such as batch effect and operator effect, may also indirectly impact the statistical analysis. Here, we discuss issues in plasma miRNA studies from a statistical point of view. The interaction effect of different ways of calculating fold-change, the choice of housekeeping genes, and methods of normalization are among the issues we discuss, with data demonstrations. P values are calculated and compared to determine the effect of those issues on statistical conclusions. Statistical methods such as analysis of variance and analysis of covariance are crucial in the analysis of miRNA but investigators are often confused about them; therefore, a brief explanation of these statistical methods is also included. In addition, 3-group classification is discussed, as it is often challenging, compared with 2-group classification.

Keywords: ANCOVA; ANOVA; batch effect; classification; fold-change; housekeeping genes; normalization; operator effect; plasma miRNA; quantile normalization; varying threshold.