Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov:85:104292.
doi: 10.1016/j.ebiom.2022.104292. Epub 2022 Sep 28.

Causal analysis identifies small HDL particles and physical activity as key determinants of longevity of older adults

Affiliations

Causal analysis identifies small HDL particles and physical activity as key determinants of longevity of older adults

Virginia Byers Kraus et al. EBioMedicine. 2022 Nov.

Abstract

Background: The hard endpoint of death is one of the most significant outcomes in both clinical practice and research settings. Our goal was to discover direct causes of longevity from medically accessible data.

Methods: Using a framework that combines local causal discovery algorithms with discovery of maximally predictive and compact feature sets (the "Markov boundaries" of the response) and equivalence classes, we examined 186 variables and their relationships with survival over 27 years in 1507 participants, aged ≥71 years, of the longitudinal, community-based D-EPESE study.

Findings: As few as 8-15 variables predicted longevity at 2-, 5- and 10-years with predictive performance (area under receiver operator characteristic curve) of 0·76 (95% CIs 0·69, 0·83), 0·76 (0·72, 0·81) and 0·66 (0·61, 0·71), respectively. Numbers of small high-density lipoprotein particles, younger age, and fewer pack years of cigarette smoking were the strongest determinants of longevity at 2-, 5- and 10-years, respectively. Physical function was a prominent predictor of longevity at all time horizons. Age and cognitive function contributed to predictions at 5 and 10 years. Age was not among the local 2-year prediction variables (although significant in univariable analysis), thus establishing that age is not a direct cause of 2-year longevity in the context of measured factors in our data that determine longevity.

Interpretation: The discoveries in this study proceed from causal data science analyses of deep clinical and molecular phenotyping data in a community-based cohort of older adults with known lifespan.

Funding: NIH/NIA R01AG054840, R01AG12765, and P30-AG028716, NIH/NIA Contract N01-AG-12102 and NCRR 1UL1TR002494-01.

Keywords: Aging; Causal analysis; High-density lipoprotein; Inflammation; Longevity; Markov boundary; Physical activity.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests Drs. Connelly and Otvos are employees of and own stock in Labcorp, the commercial provider of the NMR LipoProfile blood test. Additional institutional NIH funding is declared for Dr. Zhang (RO1 AG070146) and Dr. Ma (RO1AG070146 and RO1 HL153497) and consulting fees to Dr. Ma related to this work from the Duke Claude D. Pepper Older Americans Independence Center NIH/NIA P30-AG028716 grant. The remaining authors declare no competing interests. The funding sources provided funding only and had no role in writing of the manuscript or the decision to submit it for publication. No author has been paid to produce this manuscript. The authors were not precluded from accessing data in the study, and they accept responsibility to submit for publication.

Figures

Figure 1
Figure 1
A causal diagram illustrating the three main challenges in discovery of direct causes and Markov boundaries of longevity in simplified and idealized form. Rectangles indicate variables, arrows indicate direct causal relationships, e.g., A is a direct cause of longevity. Direct causes of longevity are of interest because they have the maximum combined causal effect on longevity and are more amenable to practical discovery than the whole causal network. Markov boundaries are useful because they have maximal predictivity for longevity and maximum compactness. Challenge 1=Feature selection, dimensionality reduction and classifier inductive biases. Mishandling measured confounding. (i) Non-causal feature selection often introduces unnecessary features resulting in unnecessarily large models. (ii) Such methods may also focus on non-causal variables that exhibit “information synthesis” (i.e., “signal aggregator” variables such as IS in the figure). Ranking by univariate association is such a commonly used method. (iii) Remote antecedent causes and related confounded variables (aka “passengers”) may also be preferentially selected (e.g., remote cause E and passenger P in the figure). (iv) It is also possible for various powerful predictor Machine Learning methods as well as classical statistical methods to be unable to differentiate between confounders and passengers and assign the same weights (e.g., Support Vector Machines and regularized regressors as well as Principal Component Analysis tend to view variables E and P, E and A, A and IS, as equally strong, despite the fact that they can be readily distinguished by conditional independence testing). Challenge 2=Discovery methods are oblivious to equivalence classes or mishandling equivalences. Whenever a variable set has the exact same information about Longevity with another set, we say that they are Target Information Equivalent (TIE). For example, variables A and A’ are target information equivalent. This means that they have the same statistical information and characteristics with respect to Longevity. (i) Most analytic methods and protocols do not consider such equivalences and report a member of the class (e.g., either A or A’ in the example). Because the equivalence class can be vast (i.e., exponential to the number of variables in the dataset), true local causes can easily be ignored. The larger the equivalence class, the larger the probability that the true causes will be missed. (ii) Also, collinearity analysis is sometimes misunderstood to handle the problem, but this is not the case: (a) collinearities examine 2 variables at a time whereas information equivalency often also exists at the variable set level; (b) highly collinear variables may not be information equivalent for Longevity; (c) weakly collinear variables may be information equivalent for Longevity. Challenge 3=Unmeasured Confounders. In the figure, H is an unmeasured confounder of K and Longevity. Algorithmic methods exist that under distributional assumptions, can reveal some of the unmeasured confounders or can ensure that some variables are not confounded by unmeasured variables. However, no such methods exist in distributions with information equivalences (as is the case in our study). The analysis methodology employed in the present study outputs local causes of Longevity: {A, B, K} and its equivalent {A’, B, K}. These sets are all Markov boundaries thus have optimal predictive signal for Longevity and are minimal (i.e., maximally compact). They can be thus readily used to create optimal predictors. Except for unmeasured confounding of variable K, the methods used successfully avoid pitfalls and challenges outlined in Figure 1; of note, no non-experimental method exists to avoid this problem. The equivalency of A with A’ is identified and highlighted for further investigation. Passengers and information sinks plus remote causes are all filtered away. No measured direct causes are missed. Causal effects of all direct causes are correctly estimated. A relatively small number of experiments (up to the cardinality of the union of the local causal sets / Markov boundaries equivalence class) is needed to resolve both information equivalent sets and unmeasured confounding. Active learning algorithms exist to further reduce the number of experiments needed to resolve the equivalence class.
Figure 2
Figure 2
Schematic representation of D-EPESE study timeline and analyses. The completed D-EPESE study was a longitudinal epidemiology assessment of locally representative community-dwelling older adults established in 1986, providing a wide range of measured clinical and molecular variables, and 33 years of subsequently documented survival data. Pertinent to this study, in-person (P1, P2 and P3) and telephone (T1&2, T3&4) interviews were conducted over 6 years from baseline in 1986. At P3, of those interviewed (2,569 survivors), 1727 provided consent for blood sampling and had a successful non-fasting blood draw; 1554 had blood stored for future use, and 1507 had both plasma and death data available for these analyses. Multiple (N=186) self-reported and clinically accessible measures (demographics, lifestyle and depression, physical activity and function, molecular biomarkers including clinical chemistries, haematological, lipids and metabolites by nuclear magnetic resonance spectroscopy (NMR), and medical conditions) were obtained. Mortality events were ascertained periodically by National Death Index searches; mortality was defined as death from any cause from 1992 (P3) through December 31, 2019 (27 years of follow-up after the blood was obtained) when the final National Death Index (NDI) search was performed. We modelled longevity at three different time horizons of clinical interest, each modelled separately, defined as a participant surviving more than or equal to 2 years, 5 years, and 10 years, respectively. The analytical pipeline included predictive modelling with a nested hold-out/cross-validation design, and Markov Boundary (MB, causal) analyses to identify local causes of longevity and their target information equivalent (TIE) classes (“signatures”) at different time horizons. We also derived a sepset for each variable that was not part of any MB; sepset analysis elucidates which variables in a MB block the influence or subsume the information of a particular variable that is not part of the MB.
Figure 3
Figure 3
Variables that predict longevity. Variables that predict 2-year (top panel), 5-year (middle panel) and 10-year (lower panel) longevity. The ranges of estimated effect sizes are depicted for variables in at least one Markov boundary (MB). All molecular variables were derived from measures at the time of the blood draw (third in-person evaluation, P3). Green indicates variables that appeared in all MBs. Total sample size 1507. The box in the figure represents the range of the estimates for each variable from models with the variable present. The left vertical line of the box represents the 1st quantile (Q1), the right line of the box represents the 3rd quantile (Q3), the internal line represents the median. left whisker=min(max(x), Q1 + 1.5 * IQR), right whisker=max(min(x), Q3–1.5 * IQR), where IQR is the interquartile range.

Similar articles

Cited by

References

    1. Nelson PG, Promislow DEL, Masel J. Biomarkers for aging identified in cross-sectional studies tend to be non-causative. J Gerontol Series A, Biol Sci Med Sci. 2020;75(3):466–472. - PMC - PubMed
    1. Social Security Administration. Actuarial life table. 2017. p. https://www.ssa.gov/OACT/STATS/table4c6.html.
    1. Yourman LC, Lee SJ, Schonberg MA, Widera EW, Smith AK. Prognostic indices for older adults: a systematic review. JAMA. 2012;307(2):182–192. - PMC - PubMed
    1. Cornoni-Huntley J, Blazer D, Lafferty M, Everett D, Brock D, Farmer M. PHS, NIH; Washington DC: 1990. Established Populations for Epidemiologic Studies of the Elderly: Resource Data Book.
    1. Huffman KM, Pieper CF, Kraus VB, Kraus WE, Fillenbaum GG, Cohen HJ. Relations of a marker of endothelial activation (s-VCAM) to function and mortality in community-dwelling older adults. J Gerontol A Biol Sci Med Sci. 2011;66(12):1369–1375. - PMC - PubMed