Analysis of high-throughput biological data using their rank values

Doulaye Dembélé

doi:10.1177/0962280218764187

Analysis of high-throughput biological data using their rank values

Stat Methods Med Res. 2019 Aug;28(8):2276-2291. doi: 10.1177/0962280218764187. Epub 2018 Mar 21.

Author

Doulaye Dembélé¹

Affiliation

¹ Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), CNRS UMR 7104, INSERM U 1258, Université de Strasbourg, Illkirch-Graffenstaden, France.

PMID: 29560792
DOI: 10.1177/0962280218764187

Abstract

High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .

Keywords: Microarray; count reads; differentially expressed genes; recurrent copy number aberration; sequencing.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Cytogenetic Analysis / methods*
Data Interpretation, Statistical*
Gene Expression Profiling / methods*
Humans
Microarray Analysis / methods*