Optimal Sparsity Selection Based on an Information Criterion for Accurate Gene Regulatory Network Inference

Deniz Seçilmiş; Sven Nelander; Erik L L Sonnhammer

doi:10.3389/fgene.2022.855770

Optimal Sparsity Selection Based on an Information Criterion for Accurate Gene Regulatory Network Inference

Front Genet. 2022 Jul 13:13:855770. doi: 10.3389/fgene.2022.855770. eCollection 2022.

Authors

Deniz Seçilmiş¹, Sven Nelander², Erik L L Sonnhammer¹

Affiliations

¹ Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden.
² Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.

Abstract

Accurate inference of gene regulatory networks (GRNs) is important to unravel unknown regulatory mechanisms and processes, which can lead to the identification of treatment targets for genetic diseases. A variety of GRN inference methods have been proposed that, under suitable data conditions, perform well in benchmarks that consider the entire spectrum of false-positives and -negatives. However, it is very challenging to predict which single network sparsity gives the most accurate GRN. Lacking criteria for sparsity selection, a simplistic solution is to pick the GRN that has a certain number of links per gene, which is guessed to be reasonable. However, this does not guarantee finding the GRN that has the correct sparsity or is the most accurate one. In this study, we provide a general approach for identifying the most accurate and sparsity-wise relevant GRN within the entire space of possible GRNs. The algorithm, called SPA, applies a "GRN information criterion" (GRNIC) that is inspired by two commonly used model selection criteria, Akaike and Bayesian Information Criterion (AIC and BIC) but adapted to GRN inference. The results show that the approach can, in most cases, find the GRN whose sparsity is close to the true sparsity and close to as accurate as possible with the given GRN inference method and data. The datasets and source code can be found at https://bitbucket.org/sonnhammergrni/spa/.

Keywords: gene expression data; gene regulatory network inference; information criteria; noise in gene expression; sparsity selection.