Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 16;11:18.
doi: 10.3389/fninf.2017.00018. eCollection 2017.

Neuroimaging, Genetics, and Clinical Data Sharing in Python Using the CubicWeb Framework

Affiliations
Free PMC article

Neuroimaging, Genetics, and Clinical Data Sharing in Python Using the CubicWeb Framework

Antoine Grigis et al. Front Neuroinform. .
Free PMC article

Abstract

In neurosciences or psychiatry, the emergence of large multi-center population imaging studies raises numerous technological challenges. From distributed data collection, across different institutions and countries, to final data publication service, one must handle the massive, heterogeneous, and complex data from genetics, imaging, demographics, or clinical scores. These data must be both efficiently obtained and downloadable. We present a Python solution, based on the CubicWeb open-source semantic framework, aimed at building population imaging study repositories. In addition, we focus on the tools developed around this framework to overcome the challenges associated with data sharing and collaborative requirements. We describe a set of three highly adaptive web services that transform the CubicWeb framework into a (1) multi-center upload platform, (2) collaborative quality assessment platform, and (3) publication platform endowed with massive-download capabilities. Two major European projects, IMAGEN and EU-AIMS, are currently supported by the described framework. We also present a Python package that enables end users to remotely query neuroimaging, genetics, and clinical data from scripts.

Keywords: Python; data sharing; database; genetics; medical informatics; neuroimaging; web service.

Figures

Figure 1
Figure 1
Architecture of a CubicWeb data sharing service (DSS) integrated in an Apache platform with LDAP. The business logic cubes provide a schema that can be instantiated in the database management system (DBMS: red puzzle piece). The system cubes ensure low-level system interactions (green puzzle piece), and the application cube proposes a web user interface (blue puzzle piece). End users access the database content through a web browser, a Python API scripting the DSS or an FTP solution, where virtual folders (acting as filters on the central repository) are proposed for download.
Figure 2
Figure 2
Illustration of the upload process. The (A) syntax of a form description JSON file, (B) corresponding web form as presented to users (here an error message returned by synchronous validation is displayed in the top red box), (C) “Quarantine” status, and (D) “Validated” status (obtained after asynchronous validation) as displayed to users: note that no feedback is shown here.
Figure 3
Figure 3
The collaborative quality control web service of a FreeSurfer segmentation element of one subject. (A) the quality indicators (in this case, a controlled vocabulary with an accept/prescribe manual edit/reject decision and an optional check-box justification), (B) a triplanar view of the white and pial surfaces overlayed on the anatomical image, and (C) the white and pial meshes with statistical indicators.
Figure 4
Figure 4
A snippet of the schema used in a publication DSS. We see from the green boxes that all entities are related to an “Assessment” entity through an “in_assessment” relation. This behavior is inherited from the access rights described in section 2.4.4.
Figure 5
Figure 5
Illustration of the download process via the proposed shopping cart mechanism. (A) the facet filter bar when all the scans (“Scan” entities) are requested (as highlighted in bold, the user has selected only the “FU2” time point and the diffusion MRI “DTI” scans), (B) the view corresponding to the filtered dataset, (C) add this new search to the cart (by activating these filtering options, the save RQL path search will be automatically updated), (D) a new search has been created, and (E) the download of the search and associated files as presented in FileZilla.
Figure 6
Figure 6
Summary views of the database status. Global information, for example the (A) gender or (B) handedness distributions, (C) acquisition status, and (D) age distribution, or longitudinal information, such as (E) the answers of subject 2 to specific questions across the study time points.

Similar articles

See all similar articles

References

    1. Abraham A., Pedregosa F., Eickenberg M., Gervais P., Mueller A., Kossaifi J., et al. (2014). Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 8:14.10.3389/fninf.2014.00014 - DOI - PMC - PubMed
    1. Book G., Anderson B., Stevens M., Glahn D., Assaf M., Pearlson G. D. (2013). Neuroinformatics database (nidb) a modular, portable database for the storage, analysis, and sharing of neuroimaging data. Neuroinformatics 11, 495–505.10.1007/s12021-013-9194-1 - DOI - PMC - PubMed
    1. Chapman B., Chang J. (2000). Biopython: python tools for computational biology. SIGBIO Newsl. 20, 15–19.10.1145/360262.360268 - DOI
    1. Das S., Zijdenbos A. P., Vins D., Harlap J., Evans A. C. (2012). LORIS: a web-based data management system for multi-center studies. Front. Neuroinformatics 5:37.10.3389/fninf.2011.00037 - DOI - PMC - PubMed
    1. Dumontier M., Callahan A., Cruz-Toledo J., Ansell P., Emonet V., Belleau F., et al. (2014). “Bio2rdf release 3: a larger connected network of linked data for the life sciences,” in Proceedings of the 2014 International Conference on Posters & Demonstrations Track – Volume 1272, ISWC-PD’14, 401–404. Available at: http://CEUR-WS.org

LinkOut - more resources

Feedback