Maximizing the utility of public data

Front Genet. 2023 Mar 31:14:1106631. doi: 10.3389/fgene.2023.1106631. eCollection 2023.

Abstract

The human genome project galvanized the scientific community around an ambitious goal. Upon completion, the project delivered several discoveries, and a new era of research commenced. More importantly, novel technologies and analysis methods materialized during the project period. The cost reduction allowed many more labs to generate high-throughput datasets. The project also served as a model for other extensive collaborations that generated large datasets. These datasets were made public and continue to accumulate in repositories. As a result, the scientific community should consider how these data can be utilized effectively for the purposes of research and the public good. A dataset can be re-analyzed, curated, or integrated with other forms of data to enhance its utility. We highlight three important areas to achieve this goal in this brief perspective. We also emphasize the critical requirements for these strategies to be successful. We draw on our own experience and others in using publicly available datasets to support, develop, and extend our research interest. Finally, we underline the beneficiaries and discuss some risks involved in data reuse.

Keywords: data-analysis; data-reuse; data-sharing; public-data; reproducible-research.

Grants and funding

This study was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (MSIT) of the Korea government [2020R1A2C2011416] and by the Commercializations Promotion Agency for R&D Outcomes (COMPA) grant funded by the Korea government (MSIT) [1711173796].