Fostering population-based cohort data discovery: The Maelstrom Research cataloguing toolkit

PLoS One. 2018 Jul 24;13(7):e0200926. doi: 10.1371/journal.pone.0200926. eCollection 2018.


Background: The lack of accessible and structured documentation creates major barriers for investigators interested in understanding, properly interpreting and analyzing cohort data and biological samples. Providing the scientific community with open information is essential to optimize usage of these resources. A cataloguing toolkit is proposed by Maelstrom Research to answer these needs and support the creation of comprehensive and user-friendly study- and network-specific web-based metadata catalogues.

Methods: Development of the Maelstrom Research cataloguing toolkit was initiated in 2004. It was supported by the exploration of existing catalogues and standards, and guided by input from partner initiatives having used or pilot tested incremental versions of the toolkit.

Results: The cataloguing toolkit is built upon two main components: a metadata model and a suite of open-source software applications. The model sets out specific fields to describe study profiles; characteristics of the subpopulations of participants; timing and design of data collection events; and datasets/variables collected at each data collection event. It also includes the possibility to annotate variables with different classification schemes. When combined, the model and software support implementation of study and variable catalogues and provide a powerful search engine to facilitate data discovery.

Conclusions: The Maelstrom Research cataloguing toolkit already serves several national and international initiatives and the suite of software is available to new initiatives through the Maelstrom Research website. With the support of new and existing partners, we hope to ensure regular improvements of the toolkit.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cohort Studies*
  • Data Analysis*
  • Databases, Factual
  • Epidemiologic Studies
  • Humans
  • Models, Statistical
  • Software
  • User-Computer Interface

Grant support

This work is supported European Union's Seventh Framework Program (, 261433, 602068, 313010, IF; Province of Quebec's ' Ministere de l'Economie, de la Science et de I'Innovation' (, IF; National Institute on Aging (, P01AG043362, IF; Canadian Partnership Against Cancer (, IF; Canadian Institutes of Health Research (, IF; Canadian Foundation for Innovation (, IF; Ontario Institute for Cancer Research (, IF; Genome Canada (, IF; Genome Quebec (, IF; Epigeny (, YM. The funders provided support in the form of salaries for authors [IF, JB, DD, YM, VF], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section.