Osteoarthritis Data Integration Portal (OsteoDIP): A web-based gene and non-coding RNA expression database

Osteoarthr Cartil Open. 2022 Jan 27;4(1):100237. doi: 10.1016/j.ocarto.2022.100237. eCollection 2022 Mar.


Objective: OsteoDIP aims to collect and provide, in a simple searchable format, curated high throughput RNA expression data related to osteoarthritis.

Design: Datasets are collected annually by searching "osteoarthritis gene expression profile" in PubMed. Only publications containing patient data and a list of differentially expressed genes are considered. From 2020, the search has expanded to include non-coding RNAs. Moreover, a search in GEO for "osteoarthritis" datasets has been performed using 'Homo sapiens' and 'Expression profiling by array' filters. Annotations for genes linked to osteoarthritis have been downloaded from external databases.

Results: Out of 1204 curated papers, 63 have been included in OsteoDIP, while GEO curation led to the collection of 28 datasets. Literature data provides a snapshot of osteoarthritis research derived from 1924 human samples, while GEO datasets provide expression for additional 1012 patients. Similar to osteoarthritis literature, OsteoDIP data has been created mostly from studies focused on knee, and the tissue most frequently investigated is cartilage. GEO data sets were fully integrated with associated clinical data. We showcase examples and use cases applicable for translational research in osteoarthritis.

Conclusions: OsteoDIP is publicly available at http://ophid.utoronto.ca/OsteoDIP. The website is easy to navigate and all the data is available for download. Data consolidation allows researchers to perform comparisons across studies and to combine data from different datasets. Our examples show how OsteoDIP can integrate with and improve osteoarthritis researchers' pipelines.

Keywords: Data integration; GEO, Gene Expression Omnibus; GWAS, Genome Wide Assocaition Studies; Gene expression; HGNC, HUGO Gene Nomenclature Committee; IID, Integrated Interactions Database; Long non-coding RNA; NAViGaTOR, NetworkAnalysis, Visualization, & Graphing TORonto; OA, osteoarthritis; OsteoDIP, Osteoarthritis Data Integration Portal; TCGA, The Cancer Genome Atlas; integrative computational biology; microRNA; mirDIP, MicroRNA Data Integration Portal; pathDIP, Pathway Data Integration Portal.