Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2017 Jul;36(7):1385-1395.
doi: 10.1109/TMI.2017.2678483. Epub 2017 Mar 6.

Quantifying the Interaction and Contribution of Multiple Datasets in Fusion: Application to the Detection of Schizophrenia

Free PMC article

Quantifying the Interaction and Contribution of Multiple Datasets in Fusion: Application to the Detection of Schizophrenia

Yuri Levin-Schwartz et al. IEEE Trans Med Imaging. .
Free PMC article

Abstract

The extraction of information from multiple sets of data is a problem inherent to many disciplines. This is possible by either analyzing the data sets jointly as in data fusion or separately and then combining as in data integration. However, selecting the optimal method to combine and analyze multiset data is an ever-present challenge. The primary reason for this is the difficulty in determining the optimal contribution of each data set to an analysis as well as the amount of potentially exploitable complementary information among data sets. In this paper, we propose a novel classification rate-based technique to unambiguously quantify the contribution of each data set to a fusion result as well as facilitate direct comparisons of fusion methods on real data and apply a new method, independent vector analysis (IVA), to multiset fusion. This classification rate-based technique is used on functional magnetic resonance imaging data collected from 121 patients with schizophrenia and 150 healthy controls during the performance of three tasks. Through this application, we find that though optimal performance is achieved by exploiting all tasks, each task does not contribute equally to the result and this framework enables effective quantification of the value added by each task. Our results also demonstrate that data fusion methods are more powerful than data integration methods, with the former achieving a classification rate of 73.5 % and the latter achieving one of 70.9 %, a difference which we show is significant when all three tasks are analyzed together. Finally, we show that IVA, due to its flexibility, has equivalent or superior performance compared with the popular data fusion method, joint independent component analysis.

Figures

Fig. 1
Fig. 1
Classification process for a single feature dataset. For the case where multiple datasets are analyzed, the ICA step is replaced with either jICA or IVA, performed on the concatenated feature datasets or the collection of feature datasets, respectively. The procedure is as follows: (a) the data is split into a training set, XTrain, and a test set, XTest. (b) the training dataset is dimension reduced using PCA, ICA is run, and the discriminatory components, STrain, and corresponding subject covariations, ÃTrain, are selected. (c) in the final stage, ÃTrain is used to train the classifier, STrain is regressed onto XTest producing ÃTest, and ÃTest is used to test the classifier. This process is repeated N times and the mean classification rate is evaluated. Note that for jICA there is a single joint set of subject covariations, ATrain and ATest. for IVA and well as the data integration technique, there are dataset specific subject covariations ATrain[k] and ATest[k]; however, for IVA the subject covariations are derived by fusing information across all datasets whereas for the data integration technique the subject covariations are extracted from each dataset individually.
Fig. 2
Fig. 2
Average classification results using KSVM for individual datasets and combinations of datasets using either data fusion, with jICA and IVA-GL, or data integration using combined ICAs. The first three points from the left refer to the case where only one dataset is analyzed. The fourth, fifth, and sixth points from the left refer to combinations of two datasets. The rightmost point shows the classification performance when all three task datasets are jointly analyzed. Note that error bars are omitted for clarity, since the largest value of the standard error is 0.0035.
Fig. 3
Fig. 3
Statistically significant components for the combination of the AOD and SIRP datasets. (a) the significant components obtained through the use of data fusion, using IVA-GL, are shown in the first two rows. The third and fourth rows contain those significant components obtained through the use of data integration, using ICA-EBM, that have a correlation above 0.5 with the components obtained using IVA-GL. The aligned components are in the same column. Those components obtained using ICA-EBM that do not have a correlation above 0.5 with any of the components obtained using IVA-GL are shown in the (b) and (c) for the AOD and SIRP datasets, respectively. The maps have been flipped such that the activation (red and orange) represents an increase in controls over patients and deactivation (blue) corresponds to a decrease in controls versus patients. The p-values for each component are located above the corresponding spatial map and those that remain significant after a Bonferroni correction are displayed in green. All spatial maps are Z-maps thresholded at Z=2.7.
Fig. 4
Fig. 4
Statistically significant components for the combination of the AOD and SM datasets. (a) the significant components obtained through the use of data fusion, using IVA-GL, are shown in the first two rows. The third and fourth rows contain those significant components obtained through the use of data integration, using ICA-EBM, that have a correlation above 0.5 with the components obtained using IVA-GL. The aligned components are in the same column. Those components obtained using ICA-EBM that do not have a correlation above 0.5 with any of the components obtained using IVA-GL are shown in the (b) and (c) for the AOD and SM datasets, respectively. The maps have been flipped such that the activation (red and orange) represents an increase in controls over patients and deactivation (blue) corresponds to a decrease in controls versus patients. The p-values for each component are located above the corresponding spatial map and those that remain significant after a Bonferroni correction are displayed in green. All spatial maps are Z-maps thresholded at Z=2.7.

Similar articles

See all similar articles

Cited by 4 articles

Feedback