Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul;71:49-57.
doi: 10.1016/j.jbi.2017.05.006. Epub 2017 May 10.

Developing a Framework for Digital Objects in the Big Data to Knowledge (BD2K) Commons: Report From the Commons Framework Pilots Workshop

Free PMC article

Developing a Framework for Digital Objects in the Big Data to Knowledge (BD2K) Commons: Report From the Commons Framework Pilots Workshop

Kathleen M Jagodnik et al. J Biomed Inform. .
Free PMC article


The volume and diversity of data in biomedical research have been rapidly increasing in recent years. While such data hold significant promise for accelerating discovery, their use entails many challenges including: the need for adequate computational infrastructure, secure processes for data sharing and access, tools that allow researchers to find and integrate diverse datasets, and standardized methods of analysis. These are just some elements of a complex ecosystem that needs to be built to support the rapid accumulation of these data. The NIH Big Data to Knowledge (BD2K) initiative aims to facilitate digitally enabled biomedical research. Within the BD2K framework, the Commons initiative is intended to establish a virtual environment that will facilitate the use, interoperability, and discoverability of shared digital objects used for research. The BD2K Commons Framework Pilots Working Group (CFPWG) was established to clarify goals and work on pilot projects that address existing gaps toward realizing the vision of the BD2K Commons. This report reviews highlights from a two-day meeting involving the BD2K CFPWG to provide insights on trends and considerations in advancing Big Data science for biomedical research in the United States.

Keywords: Accessibility; Big Data; FAIR principles; Findability; Interoperability; Reusability.


Figure 1
Figure 1. The Findability, Accessibility, Interoperability, and Reusability (FAIR) principles in the context of software harmonization, organization of methods, metadata management, hardware infrastructure, resource allocation, and usability
Organization of Methods illustrates crowdsourcing efforts to establish benchmarks for pipelines and algorithm performance. Metadata Management can include hybrid indexing that pairs manual submissions by users with automated analyses (bottom-up and top-down approaches). Metadata standards and forms are employed to implement this concept. Hardware Infrastructure includes cloud-based storage and high-performance computing solutions. Resource Allocation employs the idea of cloud computing credits model in which funds for computational resources are allocated based on need and cost. Usability considerations include training and education related to using digital resources, employing of interactive notebooks to allow reproducible and open analyses, and developing interactive data visualizations that permit dynamic modifications of displays for different data views. Software Harmonization facilitates compatibility between application programming interfaces (APIs), and Docker containers can encapsulate implementation detail to facilitate the management, reuse and indexing of tool and data repositories.
Figure 2
Figure 2. Workflows for biomedical research involving Big Data
Wet bench experiments collect measurements of cellular and tissue variables under different conditions and time points; the resulting data are processed via pipelines that perform data processing in a series of sequential steps. Different analysis steps can be benchmarked to objectively evaluate the quality of a pipeline by comparing pipelines through an objective benchmark. At the final step of the analysis, data is visualized into interactive web-based figures, and integrated with other data using statistical mining approaches such as correlation analyses, enrichment and network analyses. The publications, or other final products that result from the analyses are hosted on platforms that include PubMed, DataMed, and GEO. These repositories facilitate reuse and integration. Data, tools, and pipelines are hosted on the cloud.

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles

LinkOut - more resources