ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery

Plant Cell Physiol. 2017 Jan 1;58(1):e4. doi: 10.1093/pcp/pcw200.


ThaleMine ( is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport,, ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled.

Keywords: Arabidopsis thaliana; InterMine; data integration; data warehouse; genomics; web services.

MeSH terms

  • Arabidopsis / genetics*
  • Arabidopsis Proteins / genetics*
  • Arabidopsis Proteins / metabolism
  • Computational Biology / methods
  • Databases, Genetic*
  • Gene Expression Profiling*
  • Gene Expression Regulation, Plant / genetics*
  • Gene Ontology
  • Genomics / methods
  • Information Storage and Retrieval / methods
  • Internet
  • Protein Interaction Mapping / methods
  • Protein Interaction Maps / genetics
  • Reproducibility of Results
  • Sequence Analysis, RNA


  • Arabidopsis Proteins