ATTED-II v11: A Plant Gene Coexpression Database Using a Sample Balancing Technique by Subagging of Principal Components

Plant Cell Physiol. 2022 Jun 15;63(6):869-881. doi: 10.1093/pcp/pcac041.

Abstract

ATTED-II (https://atted.jp) is a gene coexpression database for nine plant species based on publicly available RNAseq and microarray data. One of the challenges in constructing condition-independent coexpression data based on publicly available gene expression data is managing the inherent sampling bias. Here, we report ATTED-II version 11, wherein we adopted a coexpression calculation methodology to balance the samples using principal component analysis and ensemble calculation. This approach has two advantages. First, omitting principal components with low contribution rates reduces the main contributors of noise. Second, balancing large differences in contribution rates enables considering various sample conditions entirely. In addition, based on RNAseq- and microarray-based coexpression data, we provide species-representative, integrated coexpression information to enhance the efficiency of interspecies comparison of the coexpression data. These coexpression data are provided as a standardized z-score to facilitate integrated analysis with different data sources. We believe that with these improvements, ATTED-II is more valuable and powerful for supporting interspecies comparative studies and integrated analyses using heterogeneous data.

Keywords: Arabidopsis; Comparative transcriptomics; Database; Gene coexpression; Gene network; Statistics.

MeSH terms

  • Arabidopsis* / genetics
  • Databases, Genetic
  • Gene Expression Profiling / methods
  • Gene Expression Regulation, Plant
  • Gene Regulatory Networks
  • Genes, Plant* / genetics