Analysis of high-throughput biological data using their rank values

Stat Methods Med Res. 2019 Aug;28(8):2276-2291. doi: 10.1177/0962280218764187. Epub 2018 Mar 21.

Abstract

High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .

Keywords: Microarray; count reads; differentially expressed genes; recurrent copy number aberration; sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cytogenetic Analysis / methods*
  • Data Interpretation, Statistical*
  • Gene Expression Profiling / methods*
  • Humans
  • Microarray Analysis / methods*