Dental Research Data Availability and Quality According to the FAIR Principles

J Dent Res. 2022 Oct;101(11):1307-1313. doi: 10.1177/00220345221101321. Epub 2022 Jun 2.

Abstract

According to the FAIR principles, data produced by scientific research should be findable, accessible, interoperable, and reusable-for instance, to be used in machine learning algorithms. However, to date, there is no estimate of the quantity or quality of dental research data evaluated via the FAIR principles. We aimed to determine the availability of open data in dental research and to assess compliance with the FAIR principles (or FAIRness) of shared dental research data. We downloaded all available articles published in PubMed-indexed dental journals from 2016 to 2021 as open access from Europe PubMed Central. In addition, we took a random sample of 500 dental articles that were not open access through Europe PubMed Central. We assessed data sharing in the articles and compliance of shared data to the FAIR principles programmatically. Results showed that of 7,509 investigated articles, 112 (1.5%) shared data. The average (SD) level of compliance with the FAIR metrics was 32.6% (31.9%). The average for each metric was as follows: findability, 3.4 (2.7) of 7; accessibility, 1.0 (1.0) of 3; interoperability, 1.1 (1.2) of 4; and reusability, 2.4 (2.6) of 10. No considerable changes in data sharing or quality of shared data occurred over the years. Our findings indicated that dental researchers rarely shared data, and when they did share, the FAIR quality was suboptimal. Machine learning algorithms could understand 1% of available dental research data. These undermine the reproducibility of dental research and hinder gaining the knowledge that can be gleaned from machine learning algorithms and applications.

Keywords: deep learning, machine learning; dental informatics; electronic dental records; open data; outcomes research.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Dental Research*
  • Europe
  • Information Dissemination* / methods
  • Machine Learning
  • Reproducibility of Results