Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 10;11(1):23788.
doi: 10.1038/s41598-021-02827-6.

Using imputation to provide harmonized longitudinal measures of cognition across AIBL and ADNI

Affiliations
Free PMC article

Using imputation to provide harmonized longitudinal measures of cognition across AIBL and ADNI

Rosita Shishegar et al. Sci Rep. .
Free PMC article

Abstract

To improve understanding of Alzheimer's disease, large observational studies are needed to increase power for more nuanced analyses. Combining data across existing observational studies represents one solution. However, the disparity of such datasets makes this a non-trivial task. Here, a machine learning approach was applied to impute longitudinal neuropsychological test scores across two observational studies, namely the Australian Imaging, Biomarkers and Lifestyle Study (AIBL) and the Alzheimer's Disease Neuroimaging Initiative (ADNI) providing an overall harmonised dataset. MissForest, a machine learning algorithm, capitalises on the underlying structure and relationships of data to impute test scores not measured in one study aligning it to the other study. Results demonstrated that simulated missing values from one dataset could be accurately imputed, and that imputation of actual missing data in one dataset showed comparable discrimination (p < 0.001) for clinical classification to measured data in the other dataset. Further, the increased power of the overall harmonised dataset was demonstrated by observing a significant association between CVLT-II test scores (imputed for ADNI) with PET Amyloid-β in MCI APOE-ε4 homozygotes in the imputed data (N = 65) but not for the original AIBL dataset (N = 11). These results suggest that MissForest can provide a practical solution for data harmonization using imputation across studies to improve power for more nuanced analyses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Performance of imputed simulated missing AIBL LMII scores with different sizes of training and missing data: the performance is calculated using the mean absolute error (MAE) of imputed and actual data. Different size of training data samples of the ADNI dataset (equal to the size of 10%, 50%, and 100% of the AIBL dataset) and different sizes of simulated missing data samples of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100% of the AIBL dataset were used. The results show high prediction accuracy even for training (reference) dataset with ten times smaller sample size compared to the size of the joining dataset.
Figure 2
Figure 2
Performance of imputed simulated missing AIBL MMSE scores with different sizes of training and missing data: the performance is calculated using the mean absolute error (MAE) of imputed and actual data. Different size of training data samples of the ADNI dataset (equal to the size of 10%, 50%, and 100% of the AIBL dataset) and different sizes of simulated missing data samples of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100% of the AIBL dataset were used. The results show high prediction accuracy even for a training dataset with ten times smaller sample size compared to the size of the joining data.
Figure 3
Figure 3
Performance of imputed simulated missing AIBL LMII scores with different sizes of training and missing data: the performance is calculated using the correlation between the imputed and actual data. Different size of training data samples of the ADNI dataset (equal to the size of 10%, 50%, and 100% of the AIBL dataset) and different sizes of simulated missing data samples of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100% of the AIBL dataset were used. The results show high prediction accuracy even for training (reference) dataset with 10 times smaller sample size compared to the size of the joining dataset.
Figure 4
Figure 4
Performance of imputed simulated missing AIBL MMSE scores with different sizes of training and missing data: The performance is calculated using the correlation between the imputed and actual data. Different size of training data samples of the ADNI dataset (equal to the size of 10%, 50%, and 100% of the AIBL dataset) and different sizes of simulated missing data samples of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100% of the AIBL dataset were used. The results show high prediction accuracy even for a training dataset with ten times smaller sample size compared to the size of the joining data.
Figure 5
Figure 5
ADNI data imputed: distribution of the actual AIBL CVLT-II Total Immediate Recall scores and the imputed ADNI CVLT-II Total Immediate Recall scores for each clinical classification. AIBL data imputed: distribution of the actual ADNI RAVLT Total Immediate Recall score and the imputed AIBL RAVLT Total Immediate Recall score for each clinical classification.
Figure 6
Figure 6
The association between Aβ level and memory performance measured with CVLT-II and RAVLT total immediate recall memory scores. Harmonized data include AIBL and ADNI data, presented in blue and black, respectively.

Similar articles

Cited by

References

    1. Zetterberg BKDLM. Alzheimer's disease. Lancet. 2006;368:387. - PubMed
    1. Serrano-Pozo A, Frosch MP, Masliah E, Hyman BT. Neuropathological alterations in Alzheimer disease. Cold Spring Harbor Perspect. Med. 2011;1:a006189. - PMC - PubMed
    1. Corder EH, et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science. 1993;261:921–923. - PubMed
    1. Liu Y, et al. APOE genotype and neuroimaging markers of Alzheimer's disease: Systematic review and meta-analysis. J. Neurol. Neurosurg. Psychiatry. 2015;86:127–134. - PMC - PubMed
    1. Serra-Majem L, et al. Comparative analysis of nutrition data from national, household, and individual levels: Results from a WHO-CINDI collaborative project in Canada, Finland, Poland, and Spain. J. Epidemiol. Community Health. 2003;57:74–80. - PMC - PubMed

Publication types

MeSH terms