Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 28;17(6):e1009136.
doi: 10.1371/journal.pcbi.1009136. eCollection 2021 Jun.

Multidimensional analysis and detection of informative features in human brain white matter

Affiliations

Multidimensional analysis and detection of informative features in human brain white matter

Adam Richie-Halford et al. PLoS Comput Biol. .

Abstract

The white matter contains long-range connections between different brain regions and the organization of these connections holds important implications for brain function in health and disease. Tractometry uses diffusion-weighted magnetic resonance imaging (dMRI) to quantify tissue properties along the trajectories of these connections. Statistical inference from tractometry usually either averages these quantities along the length of each fiber bundle or computes regression models separately for each point along every one of the bundles. These approaches are limited in their sensitivity, in the former case, or in their statistical power, in the latter. We developed a method based on the sparse group lasso (SGL) that takes into account tissue properties along all of the bundles and selects informative features by enforcing both global and bundle-level sparsity. We demonstrate the performance of the method in two settings: i) in a classification setting, patients with amyotrophic lateral sclerosis (ALS) are accurately distinguished from matched controls. Furthermore, SGL identifies the corticospinal tract as important for this classification, correctly finding the parts of the white matter known to be affected by the disease. ii) In a regression setting, SGL accurately predicts "brain age." In this case, the weights are distributed throughout the white matter indicating that many different regions of the white matter change over the lifespan. Thus, SGL leverages the multivariate relationships between diffusion properties in multiple bundles to make accurate phenotypic predictions while simultaneously discovering the most relevant features of the white matter.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Tractometry data flow.
(a) Whole brain tractography generates streamlines approximating the trajectories of white matter connections. (b) Tractometry classifies these streamlines into anatomical bundles. In this case, we show the left corticospinal tract (CSTL) and the left arcuate fasciculus (ARCL) over a mid-saggital anatomical slice. Tract profiling further extracts bundle profiles, quantifications of various diffusion metrics along the length of the fiber bundle. Here, we show one subject’s fractional anisotropy (FA) profile for (c) the CSTL and (d) the ARCL. (e) the phenotypical target data and tract profile features can be organized into a linear model, y^=Xβ^. The feature matrix X is color-coded to reveal a natural group structure: the left (orange) group contains k features from the CSTL, the middle (green) group contains k features from the left cingulum cingulate (CGCL), and the right (blue) group contains k features from the ARCL. The coefficients in β^ follow the same natural grouping. Panels (a) and (b) are adapted from https://figshare.com/articles/figure/example_tractography-segmentation/14485350, and reproduced under the CC-BY license (https://creativecommons.org/licenses/by/4.0/).
Fig 2
Fig 2. PCR-SGL accurately and interpretably predicts ALS diagnosis.
(a) Classification probabilities for ALS diagnosis, with controls on the left, patients on the right, predicted controls in blue, and predicted patients in orange. That is, orange dots on the left represent false positives, while blue dots on the right represent false negatives. We achieve 83% accuracy with an ROC AUC of 0.88. (b) PCR-SGL coefficients are presented on the core fibers of major fiber bundles. They exhibit high group sparsity and are concentrated in the FA of the corticospinal tract (CST). The brain is oriented with the right hemisphere in the foreground and anterior to the right of the page. The CSTL, CSTR, callosum forceps anterior (CFA), left arcuate (ARCL), and right arcuate (ARCR) bundles are indicated for orientation. (c) PCR-SGL identifies three portions of the CST as important, where β^ (dashed line, right axis) has large values. These are centered around nodes 30, 65, and 90, corresponding to locations of substantial differences in FA between the ALS and control groups (shaded areas indicates standard error of the mean). (d) Bundle profiles for false positive classifications. Line colors correspond to the marker edge color in the top left plot. These individuals have reduced FA in the CST portions which SGL identified as important. Their misclassification is coherent with the feature importance and the group differences in FA. (e) Individual bundle profiles for false negative classifications. These individuals have bundle profiles which oscillate between the group means.
Fig 3
Fig 3. Predicting age with tractometry and SGL.
(top) The predicted age vs. true age of each individual from the test splits (i.e., when each subject’s data was held out in fitting the model) for the (a) WH, (b) HBN, and (c) Cam-CAN datasets; an accurate prediction falls close to the y = x line (dashed). The mean absolute error (MAE) and coefficient of determination R2 are presented in the lower right of each scatter plot. (middle) Feature importance for predicting age from tract profile in the (d) WH, (e) HBN, and (f) Cam-CAN datasets. The orientation of the brain is that same as in Fig 2b, however because the coefficients exhibit high global sparsity (as opposed to group sparsity), we plot the mean of the absolute value of β^ for each bundle on the core fiber. The global distrubution of the β^ coefficients reflects the fact that aging is not confined to a single white matter bundle. (bottom) Age quintile bundle profiles for the (g) WH, (h) HBN, and (i) Cam-CAN datasets.
Fig 4
Fig 4. Model performance across all datasets.
Each panel shows model performance measured on the test set for each cross-validation split, with each black dot representing a split, box plots representing the quartiles, and white diamonds representing the mean performance. The y-scale varies in each subplot. (a) Accuracy of test set predictions for the ALS dataset. Because group differences in ALS diagnosis are mostly confined to a single bundle, the group structure-preserving methods, SGL and PCR-SGL, outperform the other models. The remaining frames show coefficient of determination, R2 in test sets for the (b) WH, (c) HBN, and (d) Cam-CAN datasets. Because aging affects the white matter globally, group structure-blind methods like elastic net and PCR Lasso perform well. Nonetheless, the SGL models show competitve predictive performance, adapting to a problem where group structure is not as informative. PCR-SGL performs poorly in this regime because its initial group-wise PC projection destroys between bundle covariance. The bundle-mean lasso performs poorly, demonstrating the value of along-tract profiling.
Fig 5
Fig 5. Nested cross-validation.
We evaluate model quality using a nested k-fold cross validation scheme. At level-0, the input data is decomposed into k0 shuffled groups and optimal hyperparameters are found for the level-0 training set. To avoid overfitting, the optimal hyperparameters are themselves evaluated using a cross-validation scheme taking place at level-1 of the decomposition, where each level-0 training set is further decomposed into k1 = 3 shuffled groups. In the classification case, the training and test splits are stratified by diagnosis. For the ALS and WH data, k0 = 10, while for the HBN and Cam-CAN data, k0 = 5.

Similar articles

Cited by

References

    1. Stejskal EO, Tanner JE. Spin Diffusion Measurements: Spin Echoes in the Presence of a Time-Dependent Field Gradient. The Journal of Chemical Physics. 1965;42(1):288–292. doi: 10.1063/1.1695690 - DOI
    1. Wandell BA. Clarifying human white matter. Annual review of neuroscience. 2016;39:103–128. doi: 10.1146/annurev-neuro-070815-013815 - DOI - PubMed
    1. Conturo TE, Lori NF, Cull TS, Akbudak E, Snyder AZ, Shimony JS, et al.. Tracking neuronal fiber pathways in the living human brain. Proc Natl Acad Sci U S A. 1999;96(18):10422–10427. doi: 10.1073/pnas.96.18.10422 - DOI - PMC - PubMed
    1. Mori S, Van Zijl PCM. Fiber tracking: principles and strategies–a technical review. NMR in Biomedicine: An International Journal Devoted to the Development and Application of Magnetic Resonance In Vivo. 2002;15(7-8):468–480. doi: 10.1002/nbm.781 - DOI - PubMed
    1. Yeatman JD, Dougherty RF, Myall NJ, Wandell BA, Feldman HM. Tract profiles of white matter properties: automating fiber-tract quantification. PloS one. 2012;7(11):e49790. doi: 10.1371/journal.pone.0049790 - DOI - PMC - PubMed

Publication types

MeSH terms