A novel bioinformatics approach to identify the consistently well-performing normalization strategy for current metabolomic studies

Brief Bioinform. 2019 Nov 28;bbz137. doi: 10.1093/bib/bbz137. Online ahead of print.


Unwanted experimental/biological variation and technical error are frequently encountered in current metabolomics, which requires the employment of normalization methods for removing undesired data fluctuations. To ensure the 'thorough' removal of unwanted variations, the collective consideration of multiple criteria ('intragroup variation', 'marker stability' and 'classification capability') was essential. However, due to the limited number of available normalization methods, it is extremely challenging to discover the appropriate one that can meet all these criteria. Herein, a novel approach was proposed to discover the normalization strategies that are consistently well performing (CWP) under all criteria. Based on various benchmarks, all normalization methods popular in current metabolomics were 'first' discovered to be non-CWP. 'Then', 21 new strategies that combined the 'sample'-based method with the 'metabolite'-based one were found to be CWP. 'Finally', a variety of currently available methods (such as cubic splines, range scaling, level scaling, EigenMS, cyclic loess and mean) were identified to be CWP when combining with other normalization. In conclusion, this study not only discovered several strategies that performed consistently well under all criteria, but also proposed a novel approach that could ensure the identification of CWP strategies for future biological problems.

Keywords: area under the curve; bioinformatics; consistency score; metabolomics; normalization.